Open braingram opened 4 months ago
Although all of the above is public API it seems worthwhile to investigate wrapping 4 as a helper function in stdatamodels (with sufficient documentation about how this is dangerous).
I agree having this function would be very useful. What dangers do you foresee?
Thanks for taking a look at this issue. Most of the dangers are that it bypasses all of the stdatamodels and asdf machinery. This means:
load_yaml
)I think for the s_region
use case all of the above are ok. Are there other places in jwst where this might be useful?
If one were concerned with validation, the option could be to do it the current way (though I would not make that the option for operations since the file should have been validated, and no sneaky, invalid updates should have been done). But I agree that an easy-to-use version of 4) should be provided. I can see mining of the files meta data is extremely useful, particularly out of operations and should be as efficient as possible.
ModelContainer
in jwst also provides a models_grouped
method that groups models based on a small set of metadata keywords:
https://github.com/spacetelescope/jwst/blob/master/jwst/datamodels/container.py#L466
These keywords are all strings in the primary fits header. Related to this issue, it would be useful to have an efficient way to read these keywords without incurring the overhead of traversing the schema. These attributes are duplicated in the ASDF
extension, however the values in the fits header should be preferred (as is already done for datamodels.open
).
It would be great if the efficient metadata access code could also load multiple fits keywords.
There are use cases (like the jwst
resample
step) where loading a single keyword from manyDataModel
containing fits files may be useful. As the number of files might be very large and opening every model might exceed reasonable amounts of RAM it will be important to have a performant way to perform these simple keyword accesses.Using
meta.wcsinfo.s_region
(contained in the ASDF extension) as an example there are a few ways this keyword can be read:1) Using
stdatamodels.jwst.datamodels.open
:2) Using
stdatamodels.asdf_in_fits
:3) Using
astropy.io.fits
andasdf.open
:4) Using
astropy.io.fits
andasdf.util.load_yaml
:Of the 4 options above 1-3 are similar in performance (using both a
ImageModel
and a largerIFUImageModel
as test files). With performance being limited primarily byasdf.open
(more on that below). 4 is much faster in both cases. See the below table for performance (run withcProfile
so slightly slower than real).The table shows it's ~10x faster to use
load_yaml
as this skips:Although all of the above is public API it seems worthwhile to investigate wrapping 4 as a helper function in stdatamodels (with sufficient documentation about how this is dangerous).
Below is a snakeviz generated graph of the call to
dm.open
for theIFUImageModel
data file showing the bulk of the time spent inasdf.open
: