spacetelescope / roman_datamodels

Datamodel support for the roman calibration pipeline
https://roman-datamodels.readthedocs.io
Other
7 stars 21 forks source link

Update meta.filename on read & write to reflect actual on-disk filename #387

Open schlafly opened 1 month ago

schlafly commented 1 month ago

In romancal we often use meta.filename as a default or a hint to determine what the output filename for a product should be.

When writing a file via roman_datamodels we make sure that the meta.filename of the output object matches where it was written out to: https://github.com/spacetelescope/roman_datamodels/blob/3afecbf2ad2c9376458cbcc1910174b27fac64bc/src/roman_datamodels/datamodels/_core.py#L228

However, when reading in a file, the meta.filename in the data model continues to match what is on disk.

That makes sense but can lead to confusion in the pipeline. The usual issue is that someone takes a file and copies it elsewhere with a new name. In the new file, meta.filename and the actual filename are different. Then when many romancal steps are run on the new file, meta.filename is used as the hint for the output filename, rather than the actual on-disk filename, leading to potentially confusingly-named products.

The proposal here is to update meta.filename when roman_datamodels reads a file, so that meta.filename always reflects the on-disk filename. Then romancal will do the right thing. This is also the approach that has been taken in the Webb pipeline. This presumably involves changes to rdm.datamodels.open(...).

This has the obviously downside that it is surprising if asdf.open(filename)['roman']['meta']['filename'] is different from rdm.open(filename).meta.filename. But it's also surprising if we have a different policy than Webb, and it would fix the immediate issue in romancal.