Open stscijgbot opened 6 years ago
So there is a possiblity that this could be done via a custom validator in jwst.stpipe.config_parser
. We already have custom validators for two file-like types:
if validator is None:
validator = Validator()
validator.functions['input_file'] = _get_input_file_check(root_dir)
validator.functions['output_file'] = _get_output_file_check(root_dir)
It would be straightforward to add a datamodel
type that would be either a path to a file or an open datamodel in memory. One would just need to write an appropriate validator.
That would mean that
override_mask = string(default=None)
configspec would become
override_mask = datamodel(default=None)
and datamodel
would be something that could be passed to datamodels.open()
.
Comment by Todd Miller: [~cracraft] we're starting to look at supporting this better but it may revolve around a larger stpipe refactoring. Nevertheless, can you share the code you have to write now?
Comment by Misty Cracraft: At this point, I've re-written the unit tests that I was working on when I filed the ticket. We got around the problem by creating reference file models and feeding them into the individual portions of the code rather than the full step (dq_initialization rather than DQInitStep). I can write out a sample of what I was trying to do (as it will be useful for future tests), but I don't currently have any remaining examples of code that didn't work.
Comment by Todd Miller: OK, thanks. If an example is easy and you have time, please do. Otherwise I think SCSB needs to come up with a recommended approach. It's better though to know exactly what you were trying to do to make sure we address it or come up with something better.
Here's a simple use case that currently fails. Does this represent roughly what you were trying to do?
In [1]: from jwst import datamodels
In [8]: from jwst.dq_init import DQInitStep
In [9]: input = datamodels.Level1bModel((1, 1, 2048, 2048))
In [10]: mask = datamodels.MaskModel((2048, 2048))
In [11]: mask
Out[11]: <MaskModel(2048, 2048)>
In [12]: result = DQInitStep.call(input, override_mask=mask)
---------------------------------------------------------------------------
ValidationError Traceback (most recent call last)
<ipython-input-12-62cacea1646b> in <module>
----> 1 result = DQInitStep.call(input, override_mask=mask)
~/dev/jwst/jwst/stpipe/step.py in call(cls, *args, **kwargs)
529 config, name=name, config_file=config_file)
530 else:
--> 531 instance = cls(**kwargs)
532 return instance.run(*args)
533
~/dev/jwst/jwst/stpipe/step.py in __init__(self, name, parent, config_file, _validate_kwds, **kws)
286 spec = self.load_spec_file()
287 kws = config_parser.config_from_dict(
--> 288 kws, spec, root_dir=dirname(config_file or ''))
289
290 if name is None:
~/dev/jwst/jwst/stpipe/config_parser.py in config_from_dict(d, spec, root_dir)
194
195 if spec:
--> 196 validate(config, spec, root_dir=root_dir)
197 else:
198 config.walk(string_to_python_type)
~/dev/jwst/jwst/stpipe/config_parser.py in validate(config, spec, section, validator, root_dir)
255
256 if len(messages):
--> 257 raise ValidationError('\n'.join(messages))
258 finally:
259 config.main.configspec = orig_configspec
ValidationError: Config parameter 'override_mask': the value "<MaskModel(2048, 2048)>" is of the wrong type.
Comment by Todd Miller: [~cracraft]
An example use case from James, let me know if that's close enough.
In [8]: from jwst.dq_init import DQInitStep
In [9]: input = datamodels.Level1bModel((1, 1, 2048, 2048))
In [10]: mask = datamodels.MaskModel((2048, 2048))
In [11]: mask Out[11]: <MaskModel(2048, 2048)>
ValidationError Traceback (most recent call last)
Comment by Misty Cracraft: Ok, here's a sample of my dq_init tests. This works if I use the modified line > outfile = do_dqinit(dm_ramp, ref_data) instead of > outfile = DQInitStep(dm_ramp, override_mask = ref_data). But this tests just the substep and not the full step that sets up the files.
And I just saw the test case from James. I get the same type of error when I try to run the code below with the DQInitStep being called.
{color:#000080}from {color}jwst.dq_init {color:#000080}import {color}DQInitStep {color:#000080}from {color}jwst.dq_init.dq_initialization {color:#000080}import {color}do_dqinit
def test_dqim(): # Check that PIXELDQ is initialized with the information from the reference file. # test that a flagged value in the reference file flags the PIXELDQ array # read in reference mask file_
# size of integration __ nints = 1 ngroups = 5 xsize = 1032 ysize = 1024
# create raw input data for step __ dm_ramp = make_rawramp(nints, ngroups, ysize, xsize)
# create a MaskModel for the dq input mask
__ dq, dq_def = make_maskmodel(ysize, xsize)
# edit reference file with known bad pixel values
dq[100, 100] = 2 # Dead pixel dq[200, 100] = 4 # Hot pixel _ dq[300, 100] = 8 # Unreliableslope dq[400, 100] = 16 # RC dq[500, 100] = 1 _# Do_notuse dq[100, 200] = 3 # Dead pixel + do not use dq[200, 200] = 5 # Hot pixel + do not use dq[300, 200] = 9 # Unreliable slope + do not use dq[400, 200] = 17 # RC + do not use
# write mask model __ ref_data = MaskModel(dq=dq, dq_def=dq_def) ref_data.meta.instrument.name = 'MIRI' ** ref_data.meta.subarray.xstart = 1 ref_data.meta.subarray.xsize = xsize ref_data.meta.subarray.ystart = 1 ref_data.meta.subarray.ysize = ysize
_# run dodqinit
_#outfile = do_dqinit(dm_ramp, refdata) _outfile = DQInitStep(dm_ramp, override_mask = ref_data)_ print(outfile.err.shape) dqdata = outfile.pixeldq
# assert that the pixels read back in match the mapping from ref data to science data __ assert(dqdata[100, 100] == dqflags.pixel['DEAD']) assert(dqdata[200, 100] == dqflags.pixel['HOT']) assert(dqdata[300, 100] == dqflags.pixel['UNRELIABLE_SLOPE']) assert(dqdata[400, 100] == dqflags.pixel['RC']) assert(dqdata[500, 100] == dqflags.pixel['DO_NOT_USE']) assert(dqdata[100, 200] == 1025) assert(dqdata[200, 200] == 2049) assert(dqdata[300, 200] == 16777217) assert(dqdata[400, 200] == 16385)
def makemaskmodel(ysize, xsize): # create a mask model for the dqinit step _ csize = (ysize, xsize) dq = np.zeros(csize, dtype=int) # how do we define a dqdef extension? __ mask = MaskModel()
dqdef = [(0, 1, 'DO_NOT_USE', 'Bad Pixel do not use'), (1, 2, 'DEAD', 'Dead Pixel'), (2, 4, 'HOT', 'Hot pixel'), (3, 8, 'UNRELIABLE_SLOPE', 'Large slope variance'), (4, 16, 'RC', 'RC pixel'), (5, 32, 'REFERENCE_PIXEL', 'Reference Pixel')]
dq_def = np.array((dqdef), dtype=mask.dq_def.dtype)
__ return dq, dq_def
def makerawramp(nints, ngroups, ysize, xsize): # create the data and groupdq arrays_ __ csize = (nints, ngroups, ysize, xsize) data = np.full(csize, 1.0)
# create a JWST datamodel for MIRI data __ dm_ramp = MIRIRampModel(data=data) dm_ramp.meta.subarray.xstart = 1 dm_ramp.meta.subarray.xsize = xsize dm_ramp.meta.subarray.ystart = 1 dm_ramp.meta.subarray.ysize = ysize
return dm_ramp
Comment by Todd Miller: Perfect. Addressing this may take some time, but now we know (on this ticket) what we should ideally support. Thanks!
Comment by Todd Miller: A possible approach for minimizing "collateral damage" occurred to me.
The basic issue for collateral damage is that the Step.get_reference_file() method returns a filepath, not a model, a key interface with almost every _step.py module. Switching to returning an open model would therefore potentially affect every _step.py and can't be done easily for reftypes with no model.
One possible work around would be to invent a "file handle string" to use in place of a filename for model-based overrides. The override model would be associated with the handle string in datamodels and when a constructor is run on the handle string, datamodels returns an open shallow copy(?) of the original model, much like when a constructor is called on an open model.
A handle might be something like: memory://FlatModel-9c4d0201-3a6e-466a-82ca-c30d706eb27a.
With this implementation, no temporary file or other file system footprint is nominally used/needed/supported.
James' original idea on github of using a config validator function for a filename or model for overrides would then both validate and create the datamodels association for the handle. get_reference_file() would return the handle.
Comment by Todd Miller: [discussed but rejected]
A possible approach for minimizing "collateral damage" occurred to me.
The basic issue for collateral damage is that the Step.get_reference_file() method returns a filepath, not a model, a key interface with almost every _step.py module. Switching to returning an open model would therefore potentially affect every _step.py and can't be done easily for reftypes with no model.
One possible work around would be to invent a "file handle string" to use in place of a filename for model-based overrides. The override model would be associated with the handle string in datamodels and when a constructor is run on the handle string, datamodels returns an open shallow copy of the original model, much like when a constructor is called on an open model.
A handle might be something like: memory://FlatModel-9c4d0201-3a6e-466a-82ca-c30d706eb27a.
With this implementation, no temporary file or other file system footprint is nominally used/needed/supported.
James' original idea on github of using a config validator function for a filename or model for overrides would then both validate and create the datamodels association for the handle. get_reference_file() would return the handle.
{color:#FF0000}After discussing several things became clear:{color}
Considering all these things, it's very possible that limited changes to stpipe and/or the crds_client module can enable overrides with low risk for existing use patterns based on filepath strings.
Initially resolved by #3514
But there's still a bug, as when Step.prefetch_references = True
, which is the default, one gets:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
File ~/miniconda3/envs/jwst-dev/lib/python3.11/site-packages/stdatamodels/properties.py:350, in ObjectNode.__getattr__(self, attr)
349 try:
--> 350 val = self._instance[attr]
351 except KeyError:
KeyError: 'strip'
During handling of the above exception, another exception occurred:
AttributeError Traceback (most recent call last)
Cell In[15], line 1
----> 1 pipe("jw01345002001_14201_00003_nrcalong_uncal.fits")
File ~/dev/stpipe/src/stpipe/step.py:500, in Step.run(self, *args)
498 else:
499 if self.prefetch_references:
--> 500 self.prefetch(*args)
501 try:
502 step_result = self.process(*args)
File ~/dev/stpipe/src/stpipe/step.py:616, in Step.prefetch(self, *args)
614 # prefetch truly occurs at the Pipeline (or subclass) level.
615 if len(args) and len(self.reference_file_types) and not self.skip:
--> 616 self._precache_references(args[0])
File ~/dev/stpipe/src/stpipe/pipeline.py:269, in Pipeline._precache_references(self, input_file)
265 try:
266 with self.open_model(
267 input_file, asn_n_members=1, asn_exptypes=["science"]
268 ) as model:
--> 269 self._precache_references_opened(model)
270 except (ValueError, TypeError, OSError):
271 self.log.info("First argument %s does not appear to be a model", input_file)
File ~/dev/stpipe/src/stpipe/pipeline.py:290, in Pipeline._precache_references_opened(self, model_or_container)
287 self._precache_references_opened(contained_model)
288 else:
289 # precache a single model object
--> 290 self._precache_references_impl(model_or_container)
File ~/dev/stpipe/src/stpipe/pipeline.py:328, in Pipeline._precache_references_impl(self, model)
324 how = "Override" if reftype in ovr_refs else "Prefetch"
325 self.log.info(
326 "%s for %s reference file is '%s'.", how, reftype.upper(), refpath
327 )
--> 328 crds_client.check_reference_open(refpath)
File ~/dev/stpipe/src/stpipe/crds_client.py:80, in check_reference_open(refpath)
75 def check_reference_open(refpath):
76 """Verify that `refpath` exists and is readable for the current user.
77
78 Ignore reference path values of "N/A" or "" for checking.
79 """
---> 80 if refpath != "N/A" and refpath.strip() != "":
81 with open(refpath, "rb"):
82 pass
File ~/miniconda3/envs/jwst-dev/lib/python3.11/site-packages/stdatamodels/properties.py:353, in ObjectNode.__getattr__(self, attr)
351 except KeyError:
352 if schema == {}:
--> 353 raise AttributeError("No attribute '{0}'".format(attr))
355 val = _make_default(attr, schema, self._ctx)
356 if val is not None:
AttributeError: No attribute 'strip'
Clearly prefetching for overrides that are open datamodels doesn't make sense. Prefetching any overrides perhaps doesn't make sense.
It is likely this was original fixed but has since been broken again, as the test that tests all of this calls the process()
method instead of run()
. That skips prefetch.
Of course this code now lives in stpipe
, and will have to be fixed there.
Issue JP-345 was created by Misty Cracraft:
While writing a pytest for dq_init to use as an example for my team, I tried to call the pipeline step by passing in a science data model and a reference file data model as an override. I got the error below:
E jwst.stpipe.config_parser.ValidationError: Config parameter 'override_mask': the value "<MaskModel(1024, 1032) from jwst_miri_mask_0021.fits>" is of the wrong type.
After conferring with James Davies, we discovered that the override portion of the code is expecting a string that points to the reference file (path and filename) and it crashed when I tried to pass it a data model. So I had to read in my reference file, make modifications to it, write it back out as a file, and then read the modified file in as the file override. This seems like a lot of unnecessary work, and it would make it easier for the testers and future users if they could read in a data model for the reference file override as well.