spacetelescope / rad

Nancy Grace Roman Space Telescope shared attributes for processing and archive
https://rad.readthedocs.io/
Other
5 stars 20 forks source link

EXAMPLE: remove quantities from wfi #438

Open braingram opened 2 weeks ago

braingram commented 2 weeks ago

This is an example PR for one approach at removing quantities from rad schemas (to make them more general) while retaining an easy method for making astropy.Quantity objects from arrays with appropriate units.

With the changes in this PR and the roman_datamodels branch here: https://github.com/braingram/roman_datamodels/tree/no_quantity (which updates the maker utility for the ImageModel) the quantities are removed from ImageModel while keeping the "unit" in the schema. This would allow a user to make a quantity by calling:

data_quantity = Quantity(model.data, unit=str(model.schema_info("unit")["roman"]["data"]["unit"]))

A helper function/method could be added to make this easier, perhaps something like:

data_quantity = model.as_quantity("data")
codecov[bot] commented 2 weeks ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 95.45%. Comparing base (c1811ab) to head (699ad86). Report is 18 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #438 +/- ## ========================================== + Coverage 95.38% 95.45% +0.06% ========================================== Files 4 4 Lines 195 198 +3 ========================================== + Hits 186 189 +3 Misses 9 9 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

braingram commented 2 weeks ago

One point of discussion that came up was the performance cost of Quantity vs array. The flux step is one example in romancal: https://github.com/spacetelescope/romancal/blob/main/romancal/flux/flux_step.py Running this step with an association containing 3x ~400MB input files produces the following memory usage:

Screenshot 2024-08-29 at 4 16 08 PM

The large ramp up is due to various "lazy loading" (from asdf and from ModelLibrary). However the peak of ~2.2 GB shows a temporary memory load ~1GB larger than the combined input files.

Each line like the following:

        model[data] = model[data] * c_mj

results in allocation of temporary arrays which look to be 2x the data size for what looks like a simple unit change.

Screenshot 2024-08-29 at 4 23 34 PM

The astropy docs mention a similar issue with performance: https://docs.astropy.org/en/stable/units/index.html#performance-tips However it pertains to unit assignment and not changing units. I made some attempts to improve the flux step memory performance but failed to find an easy solution to allow the unit change without additional copies.

schlafly commented 2 weeks ago

Thanks Brett. And confirming, with ndarrays this does not occur? I think in my imagination python is not very smart and so something like x = x c ends up making a temporary array xc in addition to x, and then replacing x with it, while something like x *= c could in principle be smarter and change x in place with no memory cost. I think you're saying that in addition to that the code is making an additional temporary array of unknown purpose; correct?

braingram commented 2 weeks ago

Thanks! The *= should be an improvement for the reasons you described. I thought I tried this previously but it does appear to work when I run the flux unit tests with that change. I'm not sure why the temporary array would be 2x the input size but I don't see it for *=.

It looks like there are improvements that could be made to flux step even with quantities. I could look around for other examples if it's helpful.

schlafly commented 2 weeks ago

No need to look for other examples; even if we could tweak that step to improve performance it's still indicative of substantial, surprising inefficiency with very normal looking code.