Speed of CalSpec3 and efficiency in using cube_build

jemorrison commented 6 years ago

In the current calspec3 pipeline, mrs_imatch calls cube_build (with option Single=True). When single=true a set of IFUcubes all with the same size ( a size which encompasses all the data mapped to the sky) are created but only a single exposure is mapped in each IFUCube. The background to be subtracted is determined from this data and written to the asdf extension of each input models. The single IFU cubes are tossed away. The next step in the pipeline - outlier detection also calls cube_build with Single = True (using the exact same data as mrs_imatch used) and creating the same single IFUCubes. The outlier detection does not subtract the background information determined from mrs_imatch but it used (somehow) in flagging outliers.

So in summary current calls in calspec3:

result = self.mrs_imatch (result) from mrs_imatch cbs.single = 'True' cube_models = cos.process(models) where, cube_models are the set of SINGLE IFU Cubes this routine updates the asdf extension of input model (contained in result) later in calspec3 outlier detection is called:
result = self.outlier_detection(result)

then again this routine calls cube_build the same way that outlier detection did.

So I am purposing that: mrs_imatch returns both the input models updated with background poly in asdf AND the set of Single IFU Cubes.

something along these lines: To make it robust we first need to set SingleIFUCube = 0 or empty or whatever. then something like result = self.mrs_imatch (result, SingleIFUCube)

then pass this information on to outlier rejection result = self.outlier_detection (result, SingleIFUCube). if SingleIFUCube = 0 because maybe mrs_imatch was skipped then the outlier detection would call cube_build and create the single IFU cubes. It is not 0 then it uses the IFUCubes instead of calling IFUCube with Single = True. The outlier_detection would still need to blot the data back - but it would not need the first call to cube_build.

Does that make sense to everyone ? I am happy to test this out - but I first wanted your comments. @hbushouse @mcara @jdavies-st

jemorrison commented 6 years ago

@stscieisenhamer

mcara commented 6 years ago

I have said the same thing [that computing single cubes twice in sky match and outlier detection was wasteful]. However, at that time, using single input and single output in steps was preferred.

In addition, there was a theory that outlier rejection step may need different kinds of "single" cubes (combining different channels or whatever) than what mrs_imatch would need. I do not know what is the current approach.

You also should CC @stsci-hack

stscieisenhamer commented 6 years ago

Strawman proposal if the following are true:

When "cube build" is mentioned, is it actually cube_build that is being called
That the cube sets are the same between mrs_imatch and outlier_detection

If so, then the following, psuedo-code for process:

result = <input science model>
if not self.mrs_imatch.skip or not self.outlier_detection.skip:
    single_cubes = cube_build(result, single_flag=True)

self.mrs_imatch.cubes = single_cubes
result = self.mrs_imatch(result)

self.outlier_detection.cubes = single_cubes
result = self.outlier_detection(result, cubes=single_cubes)

Where both steps have a configuration cubes where, if defined, are the ModelContainer of cubes to use. If not defined, the steps themselves would do the cube creation.

This would avoid the multiple return issues (though that really isn't an issue), make it clear the cubes are common, continue consistency with the whole Step architecture, if necessary allow configuration of the initial cube_build step.

There probably must be some check whether cubes are actually necessary depending on input data to?

jemorrison commented 6 years ago

As I understand how mrs_imatch and outlier detection works I think this strawman proposal seems like it would work.

mcara commented 6 years ago

@stscieisenhamer @jemorrison I actually somewhat disagree with the outlined workflow (if I understood it correctly). In particular, the first part of the pseudo-code makes little sense:

result = <input science model>
if not self.mrs_imatch.skip or not self.outlier_detection:
    single_cubes = cube_build(result, single_flag=True)

Yes, it could be inserted in the pipeline. However, it would complicate things if users would like to run a single individual step such as mrs_imatch and/or outlier_detection.

Instead, I think each of the two steps should be able to run "single_sci_cube_build" on demand when they do not get single cubes as input:

class MRSIMatchStep(Step):
    def process(input, single_cubes=None):
        if single_cubes is None:
            single_cubes = build_single_sci_cubes(input)
    ...........
    return (input, single_cubes)

# Similar processing as above should be done for the outlier detection
# Then:
result, single_cubes = self.mrs_imatch(cubes, single_cubes=None)
result = self.outlier_detection(result, single_cubes=single_cubes)

That is, each of the two steps of interest can be run stand-alone without previously computing single_cubes.

hbushouse commented 6 years ago

Would code like:

  self.mrs_imatch.cubes = single_cubes
  self.outlier_detection.cubes = single_cubes

end up with a couple of copies of the same data or are they both just pointers to a single instance of single_cubes?

stscieisenhamer commented 6 years ago

intent is that it is the same data. would have to see what can be gotten away with in the config system, though there is no doubt something could be worked up.

stscieisenhamer commented 6 years ago

@mcara Agreed that the two steps themselves should build their cubes if no cubes where given. Suggested code looks good.

hbushouse commented 1 year ago

There are discussions underway to completely rewrite the way outlier detection is done for IFU data, which does not involved building all the single cubes, hence if/when that is implemented, there will no longer be duplicate cubes constructed in outlier_detection and mrs_imatch and hence this issue becomes moot. So I propose closing this ticket. @jemorrison @mcara @stscieisenhamer what say you?

jemorrison commented 1 year ago

I say close it. I don't think we will be making the single cube to flag outliers. That method does not work well for IFU under sampled data. Jane

From: Howard Bushouse @.> Sent: Tuesday, April 25, 2023 11:00 AM To: spacetelescope/jwst @.> Cc: Jane Morrison @.>; Mention @.> Subject: [EXT]Re: [spacetelescope/jwst] Speed of CalSpec3 and efficiency in using cube_build (#1688)

External Email

There are discussions underway to completely rewrite the way outlier detection is done for IFU data, which does not involved building all the single cubes, hence if/when that is implemented, there will no longer be duplicate cubes constructed in outlier_detection and mrs_imatch and hence this issue becomes moot. So I propose closing this ticket. @jemorrisonhttps://github.com/jemorrison @mcarahttps://github.com/mcara @stscieisenhamerhttps://github.com/stscieisenhamer what say you?

— Reply to this email directly, view it on GitHubhttps://github.com/spacetelescope/jwst/issues/1688#issuecomment-1522194592, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AENCOWIXVXAEG464IBMRUDDXDAGK5ANCNFSM4EN2UXIA. You are receiving this because you were mentioned.Message ID: @.***>

spacetelescope / jwst

Speed of CalSpec3 and efficiency in using cube_build #1688