spacetelescope / jwst

Python library for science observations from the James Webb Space Telescope
https://jwst-pipeline.readthedocs.io/en/latest/
Other
559 stars 165 forks source link

Efficiency of Cube Build #2442

Closed stscijgbot closed 3 years ago

stscijgbot commented 6 years ago

Issue JP-364 was created by Katherine Murray:

Cube_build step of pipeline takes significant amount of time to build large dithered IFU cubes.

stscijgbot commented 6 years ago

Comment by Rosa Diaz:  [~kmurray], this size of IFU cubes will be something common? If it is not then it is low priority, but if this is going to be something common then we need to make it high priority. Please let us know.

SCSB would also look at the difference it makes between running it in a personal computer and the Cal Pipeline cluster.

stscijgbot commented 4 years ago

Comment by Jane Morrison: Katie [~kmurray] Can you re-run your MIRI IFU cube building test to see if improvements we have made since you first filed this ticket have increased the efficiency. Also if it is slower than would be helpful information. We added the DQ flagging in cube_build that might make the code slower.

stscijgbot commented 3 years ago

Comment by Jane Morrison: [~bushouse] [~jdavies] [~dencheva] [~dlaw]

Cube build is a slow step. It would be nice to speed it up. I have reviewed the code a few times looking for efficiency improvements. There are a few things that maybe we could do to improve the speed in calspec3. In calspec2 first the WCS information is used to map every pixel to the sky to find what size of the cube needs to be. Once this is done the IFU cube in setup. Then each pixel is mapped again during the IFU cube creation. 

In calspec3 the same thing is done. First the pixels are mapped the sky to figure out how big the IFU cube need to be. Then the  pixels are mapped to the sky again to build the cube. This mapping from pixel to sky is slowest for NIRSPEC. I have always wondered if we could just store the size of exposure on the sky after calspec2 then we could use this information in calspec3 and not have to map every pixel twice in calspec3. 

But a bigger gain could come by doing some of the computations in c like drizzle does for the building of the IFU cube. I have a MIRI cube build pipeline in c++ and it is FAST - just a few seconds to create IFU cubes. But that is only for single exposure files. 

Suggestions ?  Maybe someone skilled in python speed could look at cube_build and suggest some areas I could speed up.

 

stscijgbot commented 3 years ago

Comment by Jane Morrison: Coming back to this issue. I will work on improving the python code and determining which modules in cube_build take the longest and of those modules would it make sense to pull out the most computational intensive to convert it to a c routine. 

 

stscijgbot commented 3 years ago

Comment by David Law: I'd forgotten about this ticket until you bumped it just now!  Linking https://jira.stsci.edu/browse/JP-2015 as a relevant related ticket.  In my testing at least the vast majority of the runtime was spent in the FOR loop starting line 102 of https://github.com/spacetelescope/jwst/blob/master/jwst/cube_build/cube_cloud.py

stscijgbot commented 3 years ago

Comment by Jane Morrison: [~sargent] Since NIRSpec IFU cube build takes the longest I wanted to run some timing tests using a NIRSpec IFU dithered data set. I have set I think I must of gotten from you. The files have names like:

ditherunity_CLEAR_PRISM_M1_m1_noiseless_NRS1_modified_updatedHDR_hdrfix_exptypefix_cal.fits,

Where the M1 varies from M1 to M4 and m1 values from m1 to m3. For a total of 12 files.  Do you recognize this data set ? 

stscijgbot commented 3 years ago

Comment by Jane Morrison: After a review of the code using some timing and profiling tools  majority of the time is spent

finding which detector pixels fall within an ROI of the spaxel center. This is because each pixel on the detector is looped over to find which roi spaniels it falls in. I have one idea try a kdtree. I tried this in that past but it did not help. I only tried it with a cube from 1 file - it might help making cubes from several files (calspec3) Also this routine might be good to pull out into c type code (I believe that is how resample handles this issue for imager type data). If anyone ([~dencheva]  or [~jdavies]) has ideas  of how to speed up the python code let me know 

The second one that takes the most time is determining the size of the cube. With recent changes #5968 and #5969  this time should become insignificant. I am making the changes to do this now. 

The third routine that takes time is setting the DQ flag. Mostly this is setting the HOLE DQ flag.

I will look at ways to speed up that code. [~dlaw] [~sargent] I was thinking we could skip this routine in calspec2. The extract 1d routine does not use it and it not really all that useful for calspec2 ifu cubes. I believe the only calspec2 is important for ifucubes for when they are used in making a master background.

stscijgbot commented 3 years ago

Comment by Jane Morrison: [~dlaw] [~sargent] You guys ok with skipping setting DQ flag for calspec2 type data ?

 

stscijgbot commented 3 years ago

Comment by David Law: [~morrison] I think skipping the hole-finding routine in spec2 is probably reasonable; those are really intermediate cubes built with default parameters.  As such, there's no need for the hole routine to warn a user when they've set cube building to a bad set of parameters that will create holes in their data.  If we can make it easy to turn on/off in the code then we can always try it and see if we run into any problems.

stscijgbot commented 3 years ago

Comment by Beth Sargent: Jane, I don't see any issues with skipping hole-finding in spec2.

Also, in response to your question from April 13, I do recognize the data set you mention.  I think I shared that data set with you last year.

stscijgbot commented 3 years ago

Comment by Jane Morrison: [~dlaw] for setting the DQ flags I use a shapely algorithm to find the overlap of two polygons. I converted this code from the original c++ to python. I just tried the import shapely and the Polygon method. I did this in the past and I wanted to confirm - the routine in cube_build is at least 4 times faster. We can use this method in the drizzle implementation too. 

 

stscijgbot commented 3 years ago

Comment by David Law: I'm not sure I followed what part of that was a statement/question?  You mean the pipeline is already using shapely?  I didn't think that was installed?

stscijgbot commented 3 years ago

Comment by Jane Morrison: [~dlaw] . Shapely is not installed. Long long ago (for the ground test pipeline) I put the algorithm to determine the overlap between polygons in c++ and it was how the overlap was determined. I converted this code when I ported the code to python. All that code is in cube_overlap.py. I retested this code to using the shapely python import and the code in cube build is faster (for some reason). The algorithm in both cases is very similar.  I use it to set up the dq plane for MIRI and to create internal IFU cubes. 

I would like to remove the weighting = miripsf. I don't think we are going to use it. If we do we should test it. I am not even sure the reference file it used to use exists. It is cluttering up the code and I would like to remove it . Yes / NO ?

 

stscijgbot commented 3 years ago

Comment by David Law: [~morrison] Ah, now I follow you.  It would indeed be interesting to explore the drizzle implementation using the code in cube_overlap.py then.  Is this solve_intersection ?

Yes, let's remove the PSF weighting.  We've kept it around in the hope of being able to explore it for a while, but there are higher priorities and having an untested method only adds confusion.

stscijgbot commented 3 years ago

Comment by Howard Bushouse: At least partially fixed by https://github.com/spacetelescope/jwst/pull/5991

stscijgbot commented 3 years ago

Comment by David Law: A few quick runtime comparisons for MIRI MRS:

Before --> After update:

Spec2 pipeline on a single exposure: 8:38 --> 6:30

Single-channel cube build on a 4-pt dither: 6:51 --> 5:12

I.e., roughly a 25% runtime improvement.  Helpful, but I don't think it obviates the need for C-type speedups and/or parallelization.

jemorrison commented 3 years ago

Just to be clear you have to make sure assign wcs has been re-run on the data to pick up spectral_region. If it is not populated then cube_build will default and figure it out (which takes time).

stscijgbot commented 3 years ago

Comment by David Law: I was using the new assign_wcs for the spec2 pipeline; but not for the single-channel dithered cube build.  Reprocessing those too gives me a runtime of 5:28, which is slightly longer than the above for some reason (maybe just a quirk of cpu availability?).  It looks like about 3 min of that 5 min runtime is spent in the [for ipt in range(0,nn-1):] in cube_cloud.py, and 2 min in the map_fov_to_dqplane.  Does that sounds about right?

stscijgbot commented 3 years ago

Comment by Jane Morrison: Yes that sounds about right. For MIRI the next thing to do to reduce the cube_build time - is to look how to make cube_cloud.py faster. That includes looking to see if we should move it to c++.

For NIRSpec there is still a slow part of cube_build related to transforming x,y to ra, dec. But that is being looked at with another group. 

 

stscijgbot commented 3 years ago

Comment by David Law: Just a quick comment that I've been having pretty good interactions with the multiprocessing code embedded in ramp fitting (via the multiprocessing pool approach).  That may be one possible avenue to explore for further gains here as well since some parts of the cube build loop could lend themselves to parallelization well as it repeats the same calculation for N independent inputs.

stscijgbot commented 3 years ago

Comment by David Law: This ticket is very closed related to https://jira.stsci.edu/browse/JP-2096 and can similarly be closed.  The new cube building code using C bindings has been tested for performance as part of [https://github.com/spacetelescope/jwst/pull/6093.]  Results look correct, and runtime for a cube building test case drops from about 28 minutes to about 1.5 minutes, or a gain of about 20x in speed.  Ticket can be closed.  Extremely large mosaics may pose unique issues of their own from the memory management standpoint at some point, but cube building speed is no longer a limiting factor.