Open braingram opened 1 month ago
group_id
is cheaper to compute from the pool file and add to the association than to compute it when it's needed in the L3 pipeline. All the information needed to compute it should be in the pool file. Adding it will remove a lot of opening and closing of files in the Level 3 pipeline.
The fact that its use hasn't been formalized is a problem in both pipelines. JWST is moving towards formalizing its use and an example jwst association is here
https://github.com/spacetelescope/jwst/blob/master/jwst/datamodels/container.py#L102-L129
Is there a reason not to take advantage of this?
@stscijgbot-rstdms
This issue is tracked on JIRA as RCAL-852.
Thanks Brett. Broadly, I suspect that we should use this, with exposures defining the group; it would sound like that would mean adding group_id = L2 file name minus the suffix and the SCA name.
Some more specific comments:
Thanks! To check that I'm following the discussion so far:
Since there are 2 uses for grouping (outlier detection and resample) and the code currently expects the grouping to be done in the container it sound like it makes sense to keep the grouping API on the container. This also has the benefit that if another step wants to used the groupings it's already part of the common code (as it is now with ModelContainer
).
One additional question is do we want to support having the group id in the association? As @nden mentioned this is much more efficient (currently models_grouped
will open every model as that's the only way to determine the grouping). Allowing this would also make it easier to make the container part of the common jwst/romancal code (perhaps in stpipe or stcal) as jwst allows setting the group id in the association. Allowing group id to be defined in the association would mean that a user could modify the group id by changing the association (although the user could currently modify the group id by changing the files). I don't see this last point as a negative and given the other benefits do you think it's worth allowing the group id to be defined in the association?
Looking at the group id calculation the current ModelContainer
generates a group id that doesn't quite match the l2 filenames (ignoring the SCA name and suffix).
in l2 filename | in group id |
---|---|
program id | ✔️ |
execution plan | ❌ |
pass | ❌ |
segment | ❌ |
visit number | ✔️ |
visit group | ✔️ |
sequence id | ✔️ |
activity | ✔️ |
exposure | ✔️ |
Looking at the RAD schemas, would using meta.observation.obs_id
make sense:
https://github.com/spacetelescope/rad/blob/5ccb720777ed90faf13897ef92adf738693d8024/src/rad/resources/schemas/observation-1.0.0.yaml#L9
Is this used and kept up-to-date? If so it looks to be the same as the l2 filename minus the SCA and suffix.
- outlier detection could likely use grouping based on a pattern similar to the L2 file name
- grouping is not needed in skymatch as the skys for SCAs will be matched separately. As it appears the group ids are already being ignored here I think this matches what is currently being done.
- resample might be the other reason (aside from outlier detection) to keep the grouping API in the container (as it's currently used in this step)
Yes. Re sky-matching, we're not intending to fit a single offset per exposure at present, but there is a lot of uncertainty about what level of flexibility the sky matching will end up needing to have. So if it's very easy to have an option to either group by exposure or do individual SCAs, that would be nice. But it sounds like we're following our intended baseline.
One additional question is do we want to support having the group id in the association? As @nden mentioned this is much more efficient (currently
models_grouped
will open every model as that's the only way to determine the grouping). Allowing this would also make it easier to make the container part of the common jwst/romancal code (perhaps in stpipe or stcal) as jwst allows setting the group id in the association. Allowing group id to be defined in the association would mean that a user could modify the group id by changing the association (although the user could currently modify the group id by changing the files). I don't see this last point as a negative and given the other benefits do you think it's worth allowing the group id to be defined in the association?
That's fine; I don't feel strongly here. The only vague awkwardness about defining it in the file is that I really don't see many cases where we want the group to be something other than an exposure. So there's a little redundancy, but if that redundancy leads us to match Webb it's probably worth it.
I can confirm that recently all L1 files produced by romanisim started including obs_id, and that it looks to me to include all we need to identify an exposure. A potential implementation concern is that it would be easy to try to make some fake exposures by copying real exposures and changing the file names, believing that one was generating exposures that would be treated as separate exposures. But if we fail to update obs_id you could be surprised when they're all part of the same exposure. Or, relatedly, we keep the same information in the actual file name, in meta.filename, and in obsid, and it can be hard to keep these in sync. So I might prefer a default group based on the first N characters of the file name. But I don't feel strongly here.
Keeping things in sync is definitely a concern. Perhaps this is off topic but it looks like obs_id
is not used in romancal:
https://github.com/search?q=repo%3Aspacetelescope%2Fromancal+obs_id&type=code
Since this is a combination of other metadata is there a reason it's a separate metadata entry (and not a dynamic property of the datamodel)? The same is true for visit_id and in some sense meta.filename
could also be dynamic (perhaps a filename
property of the datamodel). This might help to avoid issues like https://github.com/spacetelescope/romancal/issues/818
Right, obs_id is set in romanisim during the creation of the L1 files and then not changed by romancal. Yes, I would be sympathetic to removing meta.filename, but I'm told stpipe cares about it and so that would be a bigger lift. I do think there's a use to having an overall exposure id (obs_id presently?) in the downstream archive, etc., so I'm not really militating for getting rid of obs_id.
Thanks! Good point about stpipe and meta.filename
. Let's leave any changes to that to a separate issue. It's certainly confusing to me.
For some operations (tweakreg, skymatch, resample) models are grouped based on a
group_id
.This is currently defined in
ModelContainer.models_grouped
: https://github.com/spacetelescope/romancal/blob/ed6187ffa790fc6a37dd96f7365ba1ef8261fa74/romancal/datamodels/container.py#L498-L504 which assigns a computed value tomodel.meta.group_id
models_grouped
is used in:Currently group id is always defined in
models_grouped
ignoring anygroup_id
in the model metadata (and ignoring anygroup_id
in the association, which is used for jwst).Most of the above steps are part of the mosaic pipeline with the only exception being tweakreg which is part of the exposure level pipeline.
There is no test for the exposure level pipeline that doesn't skip tweakreg so it is difficult to guess what should be done here.
For the mosaic pipeline it is possible the mosaic will contain models from different groups which will result in:
ResampleData
so this is likely always the case). Duringmany_to_one
each model is drizzled separately however the grouping is used for getting the number of pointings and for updating the exposure times.Some questions I have: