Open collinss-jpl opened 4 weeks ago
Hi @collinss-jpl and @JimHofman, yesterday I played around in a Python interactive interpreter to rework the existing XML template for CSLC into the configuration YAML - or at least the starting point to one. It's kinda hacky, and may need some tweaking to work on other PGEs, but I thought I'd share it here so you can maybe avoid the tedious task of translating the product spec documents + templates into configuration files by hand.
>>> from bs4 import BeautifulSoup
>>> from yaml import dump, CDumper
>>>
>>> soup = BeautifulSoup(open('OPERA_ISO_metadata_L2_CSLC_S1_template.xml.jinja2'), 'xml')
>>> aa_s = soup.find('eos:AdditionalAttributes')
>>> attrs = aa_s.find_all('eos:AdditionalAttribute')
>>>
>>> desc_tag = 'eos:description'
>>> dtype_tag = 'eos:EOS_AdditionalAttributeDataTypeCode'
>>> var_tag = 'eos:value'
>>> name_tag = 'eos:name'
>>> type_tag = 'eos:type'
>>>
>>> attr_dicts = [dict(var=a.find(var_tag).text.strip(), name=a.find(name_tag).text.strip(), desc=a.find(desc_tag).text.strip(), type=a.find(type_tag).text.strip(), dtype=a.find(dtype_tag).text.strip()) for a in attrs]
>>>
>>> # Unwrap variable path from jinja template
>>> for d in attr_dicts:
... d['var'] = d['var'].split('{{')[1].split('}}')[0].strip()
...
>>>
>>> # Remove listing/json/etc commands from jinja reference
>>> for d in attr_dicts:
... d['var'] = d['var'].split('|')[0]
...
>>>
>>> # Remove root dict name
>>> for d in attr_dicts:
... d['var'] = d['var'].removeprefix('product_output.')
...
>>>
>>> # Probably only needed for HDF5 metadata (or my approach to it): convert nested dict references to something "path" like
>>> for d in attr_dicts:
... d['var'] = d['var'].replace('.', '/')
...
>>>
>>> remapped = {v['var']: dict(description=v['desc'], attribute_type=v['type'], attribute_data_type=v['dtype'], display_name=v['name']) for v in attr_dicts}
>>>
>>> with open('mpc.yaml', 'w') as fp:
... dump(remapped, fp, Dumper=CDumper)
...
>>>
It's not a perfect solution but I hope it helps save you some time
Install BeautifulSoup and a parser (lxml):
pip install beautifulsoup4 pip install lxml
Another quick note: With dswx_s1, integer 'attribute_data_type' were misnamed 'integer' rather than 'int'. For dswx_hls, they were correctly assigned 'int', but I thought it worth mentioning. Riley please correct me if this description is not accurate.
Now that the new feature for automated population of Measured Parameters within ISO XML has been developed and tested with DSWx-S1, we need to roll the feature out the remaining PGEs. This should entail the following changes:
Creation of the Measured Parameters Description YAML config (see the DSWx-S1 version for an example). Note this task will also require the latest Product Spec document for the SAS to get the appropriate descriptions, as well as the existing ISO XML template to pull the appropriate values for
attribute_type
Update the existing ISO XML template to use a jinja2 for-loop construct for the MeasuredParameters section. This should be performed after creation of the Parameter Description file described above
Update the PGE's implementation of
_collect_<pge>_product_metadata()
to include a call toaugment_measured_parameters()
to ensure all metadata is formatted as expected before the template is instantiatedAfter all changes are made, run the integration test for the PGE (either via Jenkins or locally on dev machine) and inspect the resulting ISO XML product to ensure
MeasuredParameters
section is filled out as expected.[ ] #521
[ ] #522
[x] #523
[x] #524