nasa / opera-sds-pge

Observational Products for End-Users from Remote Sensing Analysis (OPERA)
Apache License 2.0
16 stars 6 forks source link

Roll out ISO XML Measured Parameter Generation feature to remaining PGEs #520

Open collinss-jpl opened 4 weeks ago

collinss-jpl commented 4 weeks ago

Now that the new feature for automated population of Measured Parameters within ISO XML has been developed and tested with DSWx-S1, we need to roll the feature out the remaining PGEs. This should entail the following changes:

RKuttruff commented 3 weeks ago

Hi @collinss-jpl and @JimHofman, yesterday I played around in a Python interactive interpreter to rework the existing XML template for CSLC into the configuration YAML - or at least the starting point to one. It's kinda hacky, and may need some tweaking to work on other PGEs, but I thought I'd share it here so you can maybe avoid the tedious task of translating the product spec documents + templates into configuration files by hand.

>>> from bs4 import BeautifulSoup
>>> from yaml import dump, CDumper
>>>
>>> soup = BeautifulSoup(open('OPERA_ISO_metadata_L2_CSLC_S1_template.xml.jinja2'), 'xml')
>>> aa_s = soup.find('eos:AdditionalAttributes')
>>> attrs = aa_s.find_all('eos:AdditionalAttribute')
>>>
>>> desc_tag = 'eos:description'
>>> dtype_tag = 'eos:EOS_AdditionalAttributeDataTypeCode'
>>> var_tag = 'eos:value'
>>> name_tag = 'eos:name'
>>> type_tag = 'eos:type'
>>>
>>> attr_dicts = [dict(var=a.find(var_tag).text.strip(), name=a.find(name_tag).text.strip(), desc=a.find(desc_tag).text.strip(), type=a.find(type_tag).text.strip(), dtype=a.find(dtype_tag).text.strip()) for a in attrs]
>>> 
>>> # Unwrap variable path from jinja template
>>> for d in attr_dicts:
...     d['var'] = d['var'].split('{{')[1].split('}}')[0].strip()
...
>>>
>>> # Remove listing/json/etc commands from jinja reference
>>> for d in attr_dicts:
...     d['var'] = d['var'].split('|')[0]
...
>>>
>>> # Remove root dict name
>>> for d in attr_dicts:
...     d['var'] = d['var'].removeprefix('product_output.')
...
>>>
>>> # Probably only needed for HDF5 metadata (or my approach to it): convert nested dict references to something "path" like
>>> for d in attr_dicts:
...     d['var'] = d['var'].replace('.', '/')
...
>>>
>>> remapped = {v['var']: dict(description=v['desc'], attribute_type=v['type'], attribute_data_type=v['dtype'], display_name=v['name']) for v in attr_dicts}
>>>
>>> with open('mpc.yaml', 'w') as fp:
...     dump(remapped, fp, Dumper=CDumper)
...
>>>

It's not a perfect solution but I hope it helps save you some time

JimHofman commented 2 weeks ago

Install BeautifulSoup and a parser (lxml):

pip install beautifulsoup4 pip install lxml

JimHofman commented 2 weeks ago

Another quick note: With dswx_s1, integer 'attribute_data_type' were misnamed 'integer' rather than 'int'. For dswx_hls, they were correctly assigned 'int', but I thought it worth mentioning. Riley please correct me if this description is not accurate.