allow non-integer ngroups for mask ref file

braingram commented 8 months ago

https://github.com/spacetelescope/stdatamodels/pull/249 followed the normal pattern of adding new keywords/attributes to reference file schemas. This involves adding a $ref to the existing keyword_<...> schema for the new keyword used as a selector in crds.

However, in this case (perhaps the first of its kind in JWST) the selector type does not match the keyword type where the selector is a string BETWEEN 0 50 and the keyword NGROUPS expects an integer. See: https://jwst-crds.stsci.edu/browse/jwst_miri_mask_0055.fits and: https://github.com/spacetelescope/stdatamodels/blob/b7c7bcd83b7a32d908e3f63c5dece313b3611f7a/src/stdatamodels/jwst/datamodels/schemas/keyword_ngroups.schema.yaml#L13-L16

This PR fixes the issue by largely duplicating the keyword_ngroups schema (leaving of the type restriction) in the mask schema.

This likely highlights a more widespread issue where crds appears to support more for selector types than the current datamodels and schemas allow. This means that a crds update may result in failures to validate models.

Additionally this was uncaught by the current tests. Adding tests for this case seems prudent.

Regression tests: https://plwishmaster.stsci.edu:8081/blue/organizations/jenkins/RT%2FJWST-Developers-Pull-Requests/detail/JWST-Developers-Pull-Requests/1189/pipeline/193

show 3 errors which appear to be unrelated:

test_nirspec_missing_msa_fail
test_nirspec_missing_msa_nofail (these two often fail)
test_duplicate_names

They all failed due to missing test in caplog.text suggesting these are due to the known issue with log messaging getting lost during regression test runs.

Checklist

[x] added entry in CHANGES.rst (either in Bug Fixes or Changes to API)
[x] updated relevant tests
[ ] updated relevant documentation
[ ] updated relevant milestone(s)
[x] added relevant label(s)

codecov[bot] commented 8 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (92abfa4) 64.84% compared to head (dcd54e0) 64.84%.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #256 +/- ## ======================================= Coverage 64.84% 64.84% ======================================= Files 103 103 Lines 5694 5694 ======================================= Hits 3692 3692 Misses 2002 2002 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

braingram commented 8 months ago

So I'm curious to know what happens to the value of model.meta.exposure.ngroups now when the reference file uses a "BETWEEN" specification? Does it come out as a string containing the entire "between" specification (e.g. "BETWEEN 0 50")? And what happens in the normal case where NGROUPS in the ref file is just a single number, e.g. 2. Does it end up getting stored as an integer in the datamodel or a string?

Thanks for the review and for the questions.

For jwst_miri_mask_0055.fits (where NGROUPS=BETWEEN 0 50) ngroups is a the entire string:

> m.meta.exposure.ngroups
'BETWEEN 0 50'

I don't seen any miri mask reference files on CRDS that use an integer NGROUPS but if we make one (by changing NGROUPS=1 for the above 0055 file) ngroups will now be an integer

> m2.meta.exposure.ngroups
1

The above non-strict type behavior comes from the lack of a type restriction in the ngroups portion of the mask schema. Some other options would be:

make the type string. This would mean that assignments to properties would need to convert values to string (like in m.meta.exposure.ngroups = str(ngroups)
make the schema accept multiple types (using a schema combiner like anyOf). Given the non-standard uses of these schemas in stdatamodels (for things like type casting) I'm not confident that the use of a schema combiner might not cause other issues.

hbushouse commented 8 months ago

@braingram Thanks for the explanation. I think the current behavior of having the data type match the type of value given (string or integer) is probably fine and not worth trying to finesse any further (because it's not used in any code anywhere - we're just trying to avoid schema validation errors when reading those ref files).

hbushouse commented 8 months ago

So I'm happy with this as-is and would be OK with merging. @tapastro any comments or suggestions before we merge? I only ask because you're in the stdatamodes-maintainers list.

tapastro commented 8 months ago

I think merging is reasonable - I did ask Jonathan to take a look at it when he gets back from vacation, but I don't know that there will be much more we can do than this. I also don't expect widespread usage of BETWEEN, so we probably don't need to worry about changing every keyword instance in a reference file model quite yet.

The only time this would cause friction for other datamodels (besides the failing test) would be if an INS team member generates a new ref file and attempts to assign a BETWEEN value to a keyword that we haven't "prepped".

braingram commented 8 months ago

I can't seem to request @stscieisenhamer as a reviewer for this so I'm using this comment to tag him. Thanks all!

stscieisenhamer commented 8 months ago

Agree with the commentary above. LGTM

braingram commented 8 months ago

CI failures after rebase are due to a CRDS update addressed in: https://github.com/spacetelescope/stdatamodels/pull/260

spacetelescope / stdatamodels

allow non-integer ngroups for mask ref file #256

Codecov Report