Closed stscijgbot-jp closed 3 weeks ago
Comment by Mihai Cara on JIRA:
The code that would allow this is pending in a couple of PRs in drizzle and jwst packages. However, there is an issue with models that creates context arrays regardless. Specifically, here is an example:
In [1]: from stdatamodels.jwst.datamodels import ImageModel
In [2]: m = ImageModel((5, 5))
In [3]: m.data
Out[3]:
array([[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.]], dtype=float32)
In [4]: m.con
Out[4]:
array([[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]], dtype=int32)
In [5]: m.con = None
In [6]: m.con
Out[6]:
array([[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]], dtype=int32)
In [7]: del m.con
In [8]: m.con
Out[8]:
array([[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]], dtype=int32)
So, until data model are not modified in a way as to allow some arrays to be None, I am afraid these arrays will always re-created.
The con
array is created when it accessed. This can be seen by looking at the instance
attribute of the datamodel:
>> from stdatamodels.jwst.datamodels import ImageModel
>> m = ImageModel((5, 5))
>> 'con' in m.instance
False
>> m.con
>> 'con' in m.instance
True
There are more details in the documentation about the datamodels: https://stdatamodels.readthedocs.io/en/latest/jwst/datamodels/structure.html#the-structure-of-datamodels
Correct. But that means we cannot completely get rid of this memory: H x W x 1
is the minimum.
If we never assign to con
it will never be allocated (making the minimum 0).
Comment by Ned Molter on JIRA:
I'm attaching memory profiling before (outlier_mihai.html) and after (outlier_mihai_2.html) the change that removed allocation of the context array. It does appear that the context array is no longer allocated, so I think this ticket can be resolved once that PR is merged.
The new peak memory usage shows that sigma_clip is using a factor of ~4 more memory than the weight array input into it. This does beg the question of whether that, too, can be optimized. But I think this should be its own ticket, especially given that the offending line lives in stcal. https://github.com/spacetelescope/stcal/blob/dfe1d6d51f0c2a400dff918a52c779328ae19b7d/src/stcal/outlier_detection/utils.py#L80
Comment by Ned Molter on JIRA:
Attaching two more flamegraphs here, showing the effect of my fix to AL-875 along with Mihai's PR. The memory usage for on-disk=True decreased by 25% from ~4 GB to ~3 GB.
The file names are outlier_before_ondisk.html
and outlier_AL875_mihai_ondisk.html
for the before and after cases, respectively.
Comment by Melanie Clarke on JIRA:
Fixed by #8866
Issue JP-3685 was created on JIRA by Brett Graham:
When resampling is performed during outlier detection, the underlying
GWCSDrizzle
generates a context array which is unused. This array can be quite large, for N 2d inputs of size H x W the context array will be of size H x W x ((N // 32) + 1). Investigate disabling this (and any other unnecessary intermediate products) to save on memory and computation during outlier detection.