Open braingram opened 1 month ago
Attention: Patch coverage is 83.66890%
with 73 lines
in your changes missing coverage. Please review.
Project coverage is 76.62%. Comparing base (
79d3a30
) to head (32ae519
).
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
I find the use of __getitem__
and __setitem__
somewhat nonintuitive when using the library theme. Wouldn't either methods such as borrow
or checkout
be more appropriate for the first, and return
for the second?
I find the use of
__getitem__
and__setitem__
somewhat nonintuitive when using the library theme. Wouldn't either methods such asborrow
orcheckout
be more appropriate for the first, andreturn
for the second?
Thanks for taking a look! I did consider borrow
and return
but return
conflicts with python usage. I couldn't come up with a good alternative to return
and since the library needs at least a __getitem__
to pass the Sequence
checks I went with that and __setitem__
for now. It should be possible (if not preferred) to drop the Sequence
check requirement (with further changes to stpipe
). Any ideas for return
alternatives? Maybe unborrow
drop_off
shelve
store
? store
might make discard
a bit more understandable (although I hope the naming for that can also be improved).
Perhaps return_model()
. But unborrow
or shelve
would also work. We can leave the __getitem__
interface in, but not use it explicitly in most of the code.
I think borrow_model
return_model
and discard_model
might be a cohesive set. I will look at what would be required in stpipe
to drop the Sequence
check.
@mairanteodoro when you have a chance would you take a look at this PR. It replaces all uses of ModelContainer
with a new ModelLibrary
. I tried to keep other changes to a minimum but there were some step changes. There are remaining things to do before opening this for review (see the todo above),
At the moment the unit tests are passing except for a few I skipped:
None
, ''
, etc) some of the steps produce errors, other produce warning log messages, I didn't see a consistent theme so figuring out how to update these will require some discussiongroup_id
where on main
it's not being passed to tweakwcs
save_intermediate_results=False
) it's my understanding that the step should delete the median file for this configuration. If so, this test and the step have bugs.I also noticed an issue with outlier detection noted here where it's using single=False
which produces a single combined resampled image for all input images that is then used as the input to the median calculation (which takes the median over an N=1). Changing this would change the regression tests results so I didn't update that in this PR.
The regression tests show 2 test failures (more on this below) : https://plwishmaster.stsci.edu:8081/job/RT/job/Roman-Developers-Pull-Requests/797/
The hlp regtest failed with a minor file difference where the current truth file does not contain the asn_pool_name
or asn_table_name
from the input association. With this PR the values are carried through the pipeline so I believe they are the intended result. I do have questions on this and a recent change to ModelContainer
in https://github.com/spacetelescope/romancal/pull/1209. I don't see any tests for the feature added in that PR so I'm not sure if the changes in this PR have any regressions with respect to that feature.
test_resample_single_file
also shows a minor difference (with this PR a group_id is recorded, I'm not yet sure if we should consider this a bug or an improvement).
This also seems to cause many issues with the unit tests, test_outlier_detection.py, test_skymatch and, test_tweakreg Do we have an idea of how much work is required to correct these?
Thanks for giving this a look. Did you pip install the package before testing? I ask because the changes in this PR require a minor modification to stpipe (see the modified pyproject.toml which points to the stpipe fork). Without those changes many tests will fail as stpipe will attempt to treat the ModelLibrary
as a ModelContainer
.
The items in the library are also read only?
ml.asn['products'][0]['members'][1]['exptype'] = 'background' Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'mappingproxy' object does not support item assignment
Can this be relaxed? For instance it might be convenient to add catalogs to the association for SDF processing.
The models are modifiable but the association metadata is read only. I made this read only because modifying the association after models have been loaded (or after the library is created when asn_exptypes
is used) leads to mismatches between the models and the association. Using the example, let's assume member 1 has a exptype of science. If the model is loaded from the library (or from the existing container), it will contain the 'science' exptype in it's metadata (at model.meta.asn.exptype
). If now the exptype is changed in the association there is no way to update the corresponding model metadata and the asn will report this as the updated value but the model metadata will still report 'science'. Making the asn read only helps to prevent this (although it's still possible to modify the model metadata and have it not match the asn).
Would you expand on the use case for adding catalogs to the association? Would it work to add these to the association before the library is created? I don't think I've yet added creating a library from an in memory association but that should be straightforward and might help here.
I'm not aware of any plans to add group_id to the Roman associations so I don't think that this is the correct place for it in the model library. If some steps need it we can discuss adding it but I don't see any use cases for Roman.
On 5/31/24 9:58 AM, Brett Graham wrote:
@.**** commented on this pull request.
In romancal/datamodels/library.py https://urldefense.com/v3/__https://github.com/spacetelescope/romancal/pull/1241*discussion_r1622464940__;Iw!!CrWY41Z8OgsX0i-WU-0LuAcUu2o!xFOr08rkhgQA-fSv2yEgyAp-NF5OAygnlOQsQbUwxce60OmkWw5LbLmlHYl3qqU4OUXqs_XI3qbn03QE6ZsuzS-a$:
- return tuple(obj)
- return obj
- return asdf.treeutil.walk_and_modify(self._asn, _to_read_only)
TODO we may want to not expose this as it could go out-of-sync
pretty easily with the actual models.
@property
def members(self):
return self.asn['products'][0]['members']
- @property
- def group_names(self):
- names = set()
- for member in self._members:
- names.add(member["group_id"])
Thanks for pointing this out. I missed that this is different than jwst (which allows setting group id in the association). I believe this was added in part to allow users to define grouping for tweakreg, skymatch, resample (any step that uses group id). It's more efficient to have this group id in the association instead of loading each model to compute the group id (which is why the group id is now being added to the level 3 association generation for jwst https://urldefense.com/v3/__https://github.com/spacetelescope/jwst/issues/8523__;!!CrWY41Z8OgsX0i-WU-0LuAcUu2o!xFOr08rkhgQA-fSv2yEgyAp-NF5OAygnlOQsQbUwxce60OmkWw5LbLmlHYl3qqU4OUXqs_XI3qbn03QE6d6RNE2S$).
The format here should match the format defined by the current model container https://urldefense.com/v3/__https://github.com/spacetelescope/romancal/blob/ed6187ffa790fc6a37dd96f7365ba1ef8261fa74/romancal/datamodels/container.py*L499__;Iw!!CrWY41Z8OgsX0i-WU-0LuAcUu2o!xFOr08rkhgQA-fSv2yEgyAp-NF5OAygnlOQsQbUwxce60OmkWw5LbLmlHYl3qqU4OUXqs_XI3qbn03QE6bvsS9ZD$ and used by tweakreg, skymatch and resample in romancal. I can remove the part of this code that allows the group id to be defined from the association if roman isn't planning on supporting that feature. However the library does need to define the group id for the mentioned steps to work. Is there something incorrect in the format? If so would you provide an example?
— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/spacetelescope/romancal/pull/1241*discussion_r1622464940__;Iw!!CrWY41Z8OgsX0i-WU-0LuAcUu2o!xFOr08rkhgQA-fSv2yEgyAp-NF5OAygnlOQsQbUwxce60OmkWw5LbLmlHYl3qqU4OUXqs_XI3qbn03QE6ZsuzS-a$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ALXCXWKWDT7R7NTUEDIOWTDZFB6YZAVCNFSM6AAAAABH4FWT36VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDAOJQHEYTAOBWGA__;!!CrWY41Z8OgsX0i-WU-0LuAcUu2o!xFOr08rkhgQA-fSv2yEgyAp-NF5OAygnlOQsQbUwxce60OmkWw5LbLmlHYl3qqU4OUXqs_XI3qbn03QE6VN_NrfP$. You are receiving this because you commented.Message ID: @.***>
On 5/31/24 9:37 AM, Brett Graham wrote:
This also seems to cause many issues with the unit tests, test_outlier_detection.py, test_skymatch and, test_tweakreg Do we have an idea of how much work is required to correct these?
Thanks for giving this a look. Did you pip install the package before testing? I ask because the changes in this PR require a minor modification to stpipe (see the modified pyproject.toml which points to the stpipe fork). Without those changes many tests will fail as stpipe will attempt to treat the |ModelLibrary| as a |ModelContainer|.
No, I just checked out your branch. I'll give that a try for the next tests.
The items in the library are also read only? |ml.asn['products'][0]['members'][1]['exptype'] = 'background' Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'mappingproxy' object does not support item assignment | Can this be relaxed? For instance it might be convenient to add catalogs to the association for SDF processing.
The models are modifiable but the association metadata is read only. I made this read only because modifying the association after models have been loaded (or after the library is created when |asn_exptypes| is used) leads to mismatches between the models and the association. Using the example, let's assume member 1 has a exptype of science. If the model is loaded from the library (or from the existing container), it will contain the 'science' exptype in it's metadata (at |model.meta.asn.exptype|). If now the exptype is changed in the association there is no way to update the corresponding model metadata and the asn will report this as the updated value but the model metadata will still report 'science'. Making the asn read only helps to prevent this (although it's still possible to modify the
Here I am thinking ahead that we'll need difference images (color maps) and may need to designate a science exposure as a background to be subtracted from the science image. We don't have any algorithms yet but just trying to be flexible.
model metadata and have it not match the asn).
Would you expand on the use case for adding catalogs to the association? Would it work to add these to the association before the library is created? I don't think I've yet added creating a library from an in memory association but that should be straightforward and might help here.
I think for the production products we can create the catalog names that are associated with an image file, thus spending hours discussing file names. For custom catalogs, strun romancal.step.TweakRegStep --catfile my_favorite_cat.asdf it would be good to be able to add / replace the catalog file in the asn. We can probably ignore this for now and focus on the production products.
— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/spacetelescope/romancal/pull/1241*issuecomment-2142183132__;Iw!!CrWY41Z8OgsX0i-WU-0LuAcUu2o!xwKb-nXLse0EPytCkC2ZgQiLbGPkgbLBW9_tdALSU8Tzu9UMF7zu5_0j0L2Q7GPU-bjneOC5yX5-a0Ypsg0kJ541$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ALXCXWNQANUVCOIYZGC4HH3ZFB4IXAVCNFSM6AAAAABH4FWT36VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBSGE4DGMJTGI__;!!CrWY41Z8OgsX0i-WU-0LuAcUu2o!xwKb-nXLse0EPytCkC2ZgQiLbGPkgbLBW9_tdALSU8Tzu9UMF7zu5_0j0L2Q7GPU-bjneOC5yX5-a0YpsrN5-LsI$. You are receiving this because you commented.Message ID: @.***>
I'm not aware of any plans to add group_id to the Roman associations so I don't think that this is the correct place for it in the model library. If some steps need it we can discuss adding it but I don't see any use cases for Roman.
Thanks for the reply. I think this is something that should be considered as it can provide dramatic performance improvements for grouping (one of the primary features of the container). What's the right venue for discussing this?
I think conceptually "group" = "exposure" for Roman. i.e., for tweakreg, we want to group all of the exposures together and compute a common astrometric shift for all SCAs. I'm not actually aware of what "group" is controlling for sky matching or outlier detection; since multiple detectors in the same exposure cannot overlap, I can't figure out really anything that grouping can do in outlier detection. For sky matching, I can imagine it being used as a fast proxy for determining that it isn't necessary to check to see if WFI01 and WFI02 in the same exposure overlap. I could also imagine some mode that fits a common sky offset for groups rather than SCAs; I believe that we fit one offset per SCA. We don't have more complicated groups than that. Likewise for resampling I'm missing something about what changing the grouping can do. For the ELP (i.e., tweakreg), we always have exactly one group; all the SCAs get tweakreg'd together.
Can you point me to some of the JWST docs for this? We should probably bring it up a Thursday romancal tag-up meeting.
On 5/31/24 1:49 PM, Brett Graham wrote:
I'm not aware of any plans to add group_id to the Roman associations so I don't think that this is the correct place for it in the model library. If some steps need it we can discuss adding it but I don't see any use cases for Roman.
Thanks for the reply. I think this is something that should be considered as it can provide dramatic performance improvements for grouping (one of the primary features of the container). What's the right venue for discussing this?
— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/spacetelescope/romancal/pull/1241*issuecomment-2142721351__;Iw!!CrWY41Z8OgsX0i-WU-0LuAcUu2o!yg35DrINkGkmxJL3bXDUnx6ShWQYRq89OXu_B_NDFC_CzMnknHAo1LoizpQ3_uV_B8xxoJ2aQDcqaqkvCQmhPlYo$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ALXCXWMHYQFQMIJN63HBCITZFCZ2NAVCNFSM6AAAAABH4FWT36VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBSG4ZDCMZVGE__;!!CrWY41Z8OgsX0i-WU-0LuAcUu2o!yg35DrINkGkmxJL3bXDUnx6ShWQYRq89OXu_B_NDFC_CzMnknHAo1LoizpQ3_uV_B8xxoJ2aQDcqaqkvCW2QmL7M$. You are receiving this because you commented.Message ID: @.***>
There are some things about the grouping in romancal that are confusing. Perhaps we pull this out into a separate issue https://github.com/spacetelescope/romancal/issues/1259
Hopefully that issue can be resolved (or at least a consensus reached) before this PR is brought out of draft.
To recap some of the discussion from tag-up today.
The winner is...
with lib:
model = lib.borrow(0)
...
lib.shelve(model)
The replacement for discard
is:
lib.shelve(model, modify=False) # modify defaults to True
and for uses where a model is replaced (with a new model)
lib.shelve(model, index)
index
will default to None
(or some other "unset" value) and the library will maintain a model to index mapping. This mapping will mean that between a borrow
and a shelve
the library will retain a reference to the "borrowed" model preventing it from being garbage collected (even if the step removes all references). So for the following example:
with lib:
model = lib.borrow(0)
new_model = model.copy()
del model # because of the internal model to index mapping this isn't sufficient to free the memory for model
lib.shelve(new_model, 0)
This seems acceptable given the reduced API complexity (shelve
does not need to always receive a model and an index) and if it becomes a performance bottleneck (which seems unlikely) the internal mapping could be updated to use weakref
(at the expense of increased complexity).
ModelLibrary
(adding a wing?)There is currently no way to "append" a model to the library. Appending is possible with ModelContainer
however when examining the usage in jwst and romancal in (almost) all cases using a list of models is equivalent and simpler. The one exception is outlier detections use of resample where the drizzled_models
are accumulated in a ModelContainer
. However in this instance a list of filenames can work (see https://github.com/spacetelescope/romancal/pull/1260 for some more context on this code in romancal).
It is possible to add a "append" method to the ModelLibrary
. However this would complicate bookkeeping for:
As the current uses can be replaced with lists, I suggest we don't implement "append" and instead think of the ModelLibrary
as a way to read models from an association and pass them through the pipeline. Some items we discussed today was a possibility of initializing a ModelLibrary
with "blank" models (to be filled by the pipeline) and using the ModelLibrary
as a way to organize members in an association (assigning them different exptypes
and/or adding new models). These operations are currently possible only by turning the library into a list of models then back into a library (although neither the library nor ModelContainer
allow saving the association). Some thought would have to go into how to implement these features in the library (or if they're more suited to other structures either existing or to-be-made).
The filepaths
in ModelContainer
are used to track which input files are used to generate an output file. This is an issue that has also been raised for jwst and some thought could be given to how ModelLibrary
might play a role in this tracking. It may make sense (or be required) to have steps update this information but I have not given the issue much thought at this point.
ModelLibrary
in stpipeI am currently considering stpipe
as the "home" for ModelLibrary
. This is a dependency for both romancal and jwst which could implement sub-classes that:
group_id
determinationfilepaths
support (for roman)Given the ease in porting the core library from jwst to romancal (few internal library changes were needed, it was mostly updating cal code) I expect the main difficulty with moving the library to stpipe
will be:
Todo:
__init__
from in memory associationmap_function
method)__setitem__
__getitem__
discard
__setitem__
changes)ModelLibrary
docsResolves RCAL-nnnn
Closes #
This PR addresses ...
Checklist
CHANGES.rst
under the corresponding subsection