spacetelescope / jwst

Python library for science observations from the James Webb Space Telescope
https://jwst-pipeline.readthedocs.io/en/latest/
Other
570 stars 167 forks source link

Improve runtime of NSClean step #8548

Closed stscijgbot-jp closed 5 months ago

stscijgbot-jp commented 5 months ago

Issue JP-3653 was created on JIRA by David Law:

Filing this ticket based on an email thread with Timothy Brandt, and pasting below an excerpt of that conversation:

As noted during the meeting, NSClean takes around 1 minute to run on a rate file on a good laptop.  Of this, about 35 seconds (on my laptop) are spent constructing the mask, and about 20 seconds are spent fitting the background.  The mask construction only needs to happen once if we are applying this correction at the group level (we would just need to add a mask for jumps at the group level, but we would already know where those are).  Then, for the background fitting, it is possible to make three tiny changes that are fully mathematically equivalent but that together gain a factor of about 15 in speedup over the current implementation.  They are:

https://github.com/spacetelescope/jwst/blob/4fcb915efa12b42ad00c6ad89cbc4ed3928195e5/jwst/nsclean/lib.py#L137

Change

p = np.diag(self.P[y][self.mask[y]])  # Weights

to

p = self.P[y][self.mask[y]]  # Weights

(i.e. don’t construct a huge matrix with almost all zeros)

 

https://github.com/spacetelescope/jwst/blob/4fcb915efa12b42ad00c6ad89cbc4ed3928195e5/jwst/nsclean/lib.py#L167

Change

A = np.matmul(p, B)

to

A = B*p[:, np.newaxis]

(i.e. don’t do the full multiplication by a huge diagonal matrix)

 

https://github.com/spacetelescope/jwst/blob/4fcb915efa12b42ad00c6ad89cbc4ed3928195e5/jwst/nsclean/lib.py#L175

Change

rfft[:k.shape[1]] = np.matmul(np.matmul(pinv_PB, p), d)

to

rfft[:k.shape[1]] = np.matmul(pinv_PB, p*d)

(use the associativity of matrix multiplication to multiply two vectors rather than two matrices)

 

This is because p is diagonal, so there is no reason to perform a full multiplication by the giant matrix, or even to create such a giant matrix in the first place.

 

With these changes, the computed background is identical but the computational cost for the background fit is just over 1 second on my laptop.  At that point I think it is no longer any problem to apply NSClean at the group level.  (DRL editorial note- this last is a potential future change).

stscijgbot-jp commented 5 months ago

Comment by David Law on JIRA:

Addressed by #8547

stscijgbot-jp commented 5 months ago

Comment by David Law on JIRA:

Adding a note for posterity that Bernie Rauscher (original author of the algorithm) also looked at these changes and gave them a thumbs up.

stscijgbot-jp commented 5 months ago

Comment by Howard Bushouse on JIRA:

Fixed by #8547

stscijgbot-jp commented 5 months ago

Comment by David Law on JIRA:

Confirmed runtime gain and no impact on results after code merge, closing ticket.