Open braingram opened 2 weeks ago
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 95.45%. Comparing base (
c1811ab
) to head (699ad86
). Report is 18 commits behind head on main.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
One point of discussion that came up was the performance cost of Quantity
vs array
. The flux
step is one example in romancal:
https://github.com/spacetelescope/romancal/blob/main/romancal/flux/flux_step.py
Running this step with an association containing 3x ~400MB input files produces the following memory usage:
The large ramp up is due to various "lazy loading" (from asdf and from ModelLibrary). However the peak of ~2.2 GB shows a temporary memory load ~1GB larger than the combined input files.
Each line like the following:
model[data] = model[data] * c_mj
results in allocation of temporary arrays which look to be 2x the data size for what looks like a simple unit change.
The astropy docs mention a similar issue with performance: https://docs.astropy.org/en/stable/units/index.html#performance-tips However it pertains to unit assignment and not changing units. I made some attempts to improve the flux step memory performance but failed to find an easy solution to allow the unit change without additional copies.
Thanks Brett. And confirming, with ndarrays this does not occur? I think in my imagination python is not very smart and so something like x = x c ends up making a temporary array xc in addition to x, and then replacing x with it, while something like x *= c could in principle be smarter and change x in place with no memory cost. I think you're saying that in addition to that the code is making an additional temporary array of unknown purpose; correct?
Thanks! The *=
should be an improvement for the reasons you described. I thought I tried this previously but it does appear to work when I run the flux unit tests with that change. I'm not sure why the temporary array would be 2x the input size but I don't see it for *=
.
It looks like there are improvements that could be made to flux step even with quantities. I could look around for other examples if it's helpful.
No need to look for other examples; even if we could tweak that step to improve performance it's still indicative of substantial, surprising inefficiency with very normal looking code.
This is an example PR for one approach at removing quantities from rad schemas (to make them more general) while retaining an easy method for making
astropy.Quantity
objects from arrays with appropriate units.With the changes in this PR and the roman_datamodels branch here: https://github.com/braingram/roman_datamodels/tree/no_quantity (which updates the maker utility for the
ImageModel
) the quantities are removed fromImageModel
while keeping the "unit" in the schema. This would allow a user to make a quantity by calling:A helper function/method could be added to make this easier, perhaps something like: