There would be significant benefits to adding automated testing against the schemas and keyword dictionary.
Currently schema_editor.py is the tool for comparing the schemas and keyword dictionary but it:
fails to correctly parse the schemas (skipping entire branches if they're nested in an allOf)
fails to correctly parse the keyword dictionary (it makes different assumptions about the structure compared to the more common keyword_dict.py found in dads-commonjwstsdp and jwst-schematic-headers)
Per conversation with @tapastro the useful comparisons are:
"data model path" (e.g. meta.observation.date) this is used for the keyword dictionary gui (to make filenames but otherwise not shown) and more importantly for the archive which uses this to check what files need reprocessing triggered by new reference files (since crds uses the "data model path")
fits keyword
fits hdu
title
description
enum (and perhaps more generally type)
In addition to the undocumented format of the keyword dictionary json the structure of the dictionary repository differs from the datamodels. Keywords in the dictionary are separated by instrument/model whereas datamodel schemas are not similarly formally separated. This complicates the comparison of some keywords (like FILTER which has different allowed values for each instrument/mode in the keyword dictionary but has an all-encompassing enum in the datamodel schema).
I propose that we compare the schemas and keyword dictionary using a tool that more closely matches the keyword_dict.py for parsing the keyword dictionary and uses the schema parsing built into stdatamodels for the datamodel schema parsing. The result would be a comparison tool that more closely matches the actual uses of these files (and is likely much simpler).
There would be significant benefits to adding automated testing against the schemas and keyword dictionary.
Currently
schema_editor.py
is the tool for comparing the schemas and keyword dictionary but it:allOf
)keyword_dict.py
found indads-common
jwstsdp
andjwst-schematic-headers
)Per conversation with @tapastro the useful comparisons are:
meta.observation.date
) this is used for the keyword dictionary gui (to make filenames but otherwise not shown) and more importantly for the archive which uses this to check what files need reprocessing triggered by new reference files (since crds uses the "data model path")In addition to the undocumented format of the keyword dictionary json the structure of the dictionary repository differs from the datamodels. Keywords in the dictionary are separated by instrument/model whereas datamodel schemas are not similarly formally separated. This complicates the comparison of some keywords (like FILTER which has different allowed values for each instrument/mode in the keyword dictionary but has an all-encompassing enum in the datamodel schema).
I propose that we compare the schemas and keyword dictionary using a tool that more closely matches the
keyword_dict.py
for parsing the keyword dictionary and uses the schema parsing built into stdatamodels for the datamodel schema parsing. The result would be a comparison tool that more closely matches the actual uses of these files (and is likely much simpler).