Stage 3 stalls for NIRSpec G395H

lucasastro commented 1 week ago

Instrument

NIRSpec (Stages 1-3)

What happened?

Stage 3 keeps stalling. Apparently it is encountering a zero-size array. I ran the Stage 1 and Stage 2 reduction with NIRSpec-G395H specific settings. The Stage 2 output FITS files (*calints.fits) look like normal 2d frames. It seems that it crashes during source_pos.py routine when finding the source location. What could be the issue here? Some NaN-clipping settings I should adapt in the .ecf file(s), or could it be rooted somewhere in Stage 1 or 2? I have already experimented by changing some settings in these .ecf files following the documentation, but to no effect.

Error traceback output

Starting Stage 3 Reduction

Input directory: /Users/lucas/Documents/ScienceHD/REVEAL_HD/JWSTdata/K2-18b/Stage2/S2_2024-06-26_K2-18_run4/
Output directory: /Users/lucas/Documents/ScienceHD/REVEAL_HD/JWSTdata/K2-18b/Stage3/S3_2024-06-27_K2-18_run1/ap6_bg7/
Using ap=6, bg=7, expand=1
Copying S3 control file

Found 4 data file(s) ending in calints.fits
Starting file 1 of 4
  Reading file 1...
  Masking NaNs/infs in data arrays...
  FLUX has 39912 NaNs/infs, which is 0.46% of all pixels.
  ERR has 39912 NaNs/infs, which is 0.46% of all pixels.
  V0 has 39912 NaNs/infs, which is 0.46% of all pixels.
  Locating source position...
    Source position on detector is row 21.
  Automatically getting reference files to convert units to electrons...
  Converting from data numbers per second (DN/s) to electrons...
  Performing full frame outlier rejection...
    Flagged 0.301180% of pixels as bad.
  Computing clean median frame...
  Correcting curvature and bringing the trace to the center of the detector...
  !!! Ensure that you are using meddata for the optimal extraction profile !!!
    Correcting the wavelength solution...
    Correcting the curvature over all integrations...
    Updating src_ypos to new center, row 12...
  Performing CxC background subtraction...
100%|███████████████████████████████████████████████████████████████████████████████████████| 950/950 [00:16<00:00, 57.18it/s]
  Creating figures for background subtraction...
100%|███████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:03<00:00,  1.61it/s]
  Recording y position and width for all integrations...
  0%|                                                                                                 | 0/950 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/Users/lucas/Documents/ScienceHD/REVEAL_HD/JWSTdata/K2-18b/run_eureka.py", line 26, in <module>
    s3_spec, s3_meta = s3.reduce(eventlabel, ecf_path=ecf_path)
  File "/opt/miniconda3/envs/eureka/lib/python3.9/site-packages/eureka/S3_data_reduction/s3_reduce.py", line 447, in reduce
    source_pos.source_pos_wrapper(data, meta, log,
  File "/opt/miniconda3/envs/eureka/lib/python3.9/site-packages/eureka/S3_data_reduction/source_pos.py", line 73, in source_pos_wrapper
    writePos(source_pos(flux[n], meta, data.attrs['shdr'],
  File "/opt/miniconda3/envs/eureka/lib/python3.9/site-packages/eureka/S3_data_reduction/source_pos.py", line 162, in source_pos
    src_ypos, src_ywidth = source_pos_gauss(flux, meta, m, n, plot)
  File "/opt/miniconda3/envs/eureka/lib/python3.9/site-packages/eureka/S3_data_reduction/source_pos.py", line 402, in source_pos_gauss
    p0 = [np.ma.max(med_row), pos_max, sigma0, np.ma.median(med_row)]
  File "/opt/miniconda3/envs/eureka/lib/python3.9/site-packages/numpy/ma/core.py", line 6764, in max
    return obj.max(axis=axis, fill_value=fill_value, out=out, **kwargs)
  File "/opt/miniconda3/envs/eureka/lib/python3.9/site-packages/numpy/ma/core.py", line 5885, in max
    result = self.filled(fill_value).max(
  File "/opt/miniconda3/envs/eureka/lib/python3.9/site-packages/numpy/core/_methods.py", line 40, in _amax
    return umr_maximum(a, axis, None, out, keepdims, initial, where)
ValueError: zero-size array to reduction operation maximum which has no identity

What operating system are you using?

Mac OSX Sonoma 14.4.1

What version of Python are you running?

Python 3.9.7

What Python packages do you have installed?

conda list.txt

Code of Conduct

[X] I agree to follow this project's Code of Conduct

taylorbell57 commented 1 week ago

Can you copy-paste the contents of your Stage 3 ECF so that I have more to troubleshoot with?

kevin218 commented 1 week ago

Have you tried adjusting the size of xwindow?

lucasastro commented 1 week ago

No I have not adjusted the size of xwindow - is there a standard it should take for G395H?

This is the S3 ecf. It may have something to do with the S1 and S2 settings, which I have included for good measure. I tried getting the NIRSpec G395H settings from the documentation but perhaps it is still off - I don't know where to begin troubleshooting this. Tomorrow I will start afresh from S1, but if something obvious that needs changing stands out, this would be good to know.

S1_k2-18.ecf.txt S2_k2-18.ecf.txt S3_k2-18.ecf.txt

taylorbell57 commented 1 week ago

I agree with Kevin that this is likely not a bug with Eureka but more likely an issue with how you have setup you ECF. I recommend you look carefully at all of your plots and ECF settings to try to troubleshoot the issue yourself. I recommend you set isplots_S3 to 5 and nplots to 5 so that you get several useful troubleshooting plots. And to help you understand the ECF settings, please take a look at our documentation at https://eurekadocs.readthedocs.io/en/latest/ecf.html. For example, your Figure 3301 figures (see an example at https://eurekadocs.readthedocs.io/en/latest/outputs.html#stage-3-outputs) will partially help you identify whether there are any columns where there is no star (which would cause the centroiding to fail). You can also use a tool like DS9 to view the FITS files to help you choose reasonable parameters.

It isn't clear to me whether you're working with NRS1 or NRS2 data from NIRSpec, but you do need to work separately with each of those detectors since they will need separate ECF values. For example, I usually set ywindow to [0,32] for both detectors but set xwindow to [520,2047] for NRS1 and [5,2047] for NRS2.

lucasastro commented 1 week ago

OK this is helpful. Thanks, I will provide feedback on this issue on the progress.

lucasastro commented 5 days ago

This has worked out for me so far, thanks. Stage 1 is still a problem; keeps taking too much time and aborting, suspect it has something to do with my ECF settings, for now I will work with JWST pipeline S1 outputs and return to this issue / start a new one if really needed.

taylorbell57 commented 5 days ago

Glad to hear it! My strong suspicion is that your computer has insufficient RAM to run Stage 1 on the NIRSpec data. With my old computer, I would encounter similar issues, but now that I have 64 GB of RAM in my new computer I almost never encounter this issue. One thing you could try is allocating more swap memory which uses some of your hard drive storage like a kind of very slow virtual RAM when you've overloaded you physical RAM. I recently had to do that myself, but the process to do that is very platform dependent and not something I can support you in doing, nor can I guarantee it'd solve your problem; but it is a free option which might be worth pursuing. If you do confirm that the issue is with excessive RAM usage (using a tool like top, htop, or Activity Monitor), there's nothing that we can do on our end since the "issue" is with the jwst pipeline and not ours. You could open an issue (or add your voice to an already existing issue) on their repository here if you wanted.

Since this particular issue seems to be resolved, I'm going to close this issue. Feel free to reopen the issue if your issue with Stage 3 re-emerges, or feel free to open a separate issue if something unrelated occurs that you're unable to resolve after first trying some troubleshooting yourself

lucasastro commented 5 days ago

Ah thanks, seems reasonable to suspect this RAM overload is the issue - I will let you know if anything useful comes out of my troubleshooting.

lucasastro commented 3 days ago

One more question regarding S3 reduction:

The arrays delivered look good judging by the quality control figure outputs. However I have trouble using the .h5 files, presumably because of a wavelength array containing (too many) NaN values.

The S3 pipeline warned me about this:

WARNING: Your wavelength array has 614 NaNs, which are outside of the wavelength solution. You should consider removing indices (array([ 913,  914,  915,  916,  917,  918,  919,  920,  921,  922,  923, ..., 1526]
etc.

Also, I should note that at the end of S3 reduction a flurry of extractions take place (I have switched off 1d extraction) that give this feedback:

2024-07-04 13:32:41,058 - stpipe - WARNING - /opt/miniconda3/envs/eureka/lib/python3.9/site-packages/eureka/S3_data_reduction/optspex.py:807: RuntimeWarning: invalid value encountered in divide
  stdevs = np.abs(subdata - expected)*submask/np.sqrt(variance)

...which could be a page of the same book that caused the wavelength-NaN issue.

I thought this could be mitigated by playing with the xwindow values, but I have selected the region that includes the object spectrum (similar values as suggested by you in https://github.com/kevin218/Eureka/issues/667#issuecomment-2198375045), and checked on DS9 that the spectrum is actually there in the *calints.fits file.

So what causes the (seemingly random) indices of the wavelength array turning to NaN? Any influence to be had on this by tuning parameters in the S3 ecf file?

I have found that the message is triggered by line 386 in s3_reduce.py:

                # Compute 1D wavelength solution
                if 'wave_2d' in data:
                    data['wave_1d'] = (['x'],
                                       data.wave_2d[meta.src_ypos].values)
                    data['wave_1d'].attrs['wave_units'] = \
                        data.wave_2d.attrs['wave_units']

                # Check for bad wavelength pixels (beyond wavelength solution)
                util.check_nans(data.wave_1d.values, np.ones(meta.subnx), log,
                                name='wavelength')

And 'data' is (in my case) created from standard JWST S1 output contained in the package downloaded from MAST. Could it be that I need to do Stage 1 myself after all...?

Perhaps at the root of this is the S2 reduction, where the ecf includes:

#Modify the existing file to change the dispersion extraction - FIX: DOES NOT WORK CURRENTLY
waverange_start         6e-08   # Use None to rely on the default parameters
waverange_end           6e-06   # Use None to rely on the default parameters

But if I use None here, I get calibrated images that seem cropped / degraded in resolution (xsize=1305, before it was xsize=2047) and the same problem regarding NaN values in the wavelength array persists.

I will keep trying other things but if you have more wisdom to add here I'd very much appreciate it.

PS I had wanted to reopen this issue, but it seems I do not have permission to do this.

kevin218 commented 3 days ago

What values are you using for xwindow? Is this PRISM data?

lucasastro commented 3 days ago

xwindow is [520,2047] for NRS1 and [5,2047] for NRS2. It is not PRISM data, but G395H BOTS data with F290LP filter...

taylorbell57 commented 2 days ago

Hmm, no you shouldn't need to run Stage 1 yourself just to get the wavelength solution. And your waverange_start and waverange_end values should be left at those values of 6e-08 and 6e-06 which you got from the template.

Would you mind sending your latest Stage 2 and 3 ECFs so that I can take a look at that? It is super strange that the NaN wavelengths are in the middle of the xwindow and not at the edge. Based on the plots output during Stage 3, can you confirm that there is an obvious spectrum in all columns of Fig 3301, Fig 3303, Fig 3304, and Fig 3308?

lucasastro commented 2 days ago

Since the last interaction, I have adapted S2 and S3 settings, they now match very closely the NIRSpec settings provided in the templates in the tests folder. As a result, while the warning about the NaN indices still appears, in the resulting .h5 files the wavelength arrays do not include NaN values. This is also the case in the figure outputs.

For my current science objective, I am not so interested in products of S4 reduction, so I have not yet tested if that crashes (it did previously).

These are the latest .ecf files (that produced workable .h5 files): S2_k2-18_nrs1.ecf.txt S2_k2-18_nrs2.ecf.txt S3_k2-18_nrs1.ecf.txt S3_k2-18_nrs2.ecf.txt

The full output of the NRS1 and NRS2 reduction is here: S2_nrs1_output.txt S3_nrs1_output.txt S2_nrs2_output.txt S3_nrs2_output.txt

As you can see, the NaN-indices have shifted compared to the previous run, but like you saw before it is in the middle of the xwindow. Also, note that I have not quality checked the data, but the perpetual warning when optimal extraction (?) takes place:

2024-07-05 11:18:29,093 - stpipe - WARNING - /opt/miniconda3/envs/eureka/lib/python3.9/site-packages/eureka/S3_data_reduction/optspex.py:850: RuntimeWarning: invalid value encountered in divide
  spectrum = np.nansum(profile*submask*subdata/variance,

does not bode well.

So all in all, I am glad that the output is better than before (i.e. there is output to begin with) but the warnings and NaN issue do not give me enough confidence yet to trust the result. Hopefully I will succeed in resolving this mystery. Your help is appreciated.

Finally, to answer your question "Based on the plots output during Stage 3, can you confirm that there is an obvious spectrum in all columns of Fig 3301, Fig 3303, Fig 3304, and Fig 3308?" There is a spectrum in all columns (rows, as the dispersion direction is horizontal...?), although in all but the 3308 images the optimal extraction seems a bit zig-zaggy (perhaps reflected by the 'invalid value' warnings in the output?).

It looks less smooth and lower resolution than the examples provided in https://eurekadocs.readthedocs.io/en/latest/outputs.html#s3-out...

Here are representative examples of the (NRS1) S3 figures mentioned by you: fig3301_file0_int000_CxC_ImageAndBackground fig3303_file0_int000_Profile fig3304_file0_ResidualBG fig3308_file0_MedianFrame

And these are the ones for NRS2: fig3301_file0_int000_CxC_ImageAndBackground fig3303_file0_int000_Profile fig3304_file0_ResidualBG fig3308_file0_MedianFrame

taylorbell57 commented 2 days ago

First, can you uncomment the slit_y_low and slit_y_high rows in both of your Stage 2 ECFs? Additionally, you should set skip_bkg_subtract, skip_pathloss, and skip_resample to True in your Stage 2 ECFs (you should only have set these values to False if you know that you need to run them and that they're not going to break things for you). Finally, as I'd mentioned above I recommend you set ywindow to [0,32] for both of your Stage 3 ECFs.

Some combination of those changes is very likely to resolve your issue

lucasastro commented 1 day ago

This seems to have done the trick regarding the NaN-values in wavelength arrays. However, this message keeps appearing during optimal spectral extraction in Stage 3:

2024-07-06 23:48:34,632 - stpipe - WARNING - /opt/miniconda3/envs/eureka/lib/python3.9/site-packages/eureka/S3_data_reduction/optspex.py:850: RuntimeWarning: invalid value encountered in divide
  spectrum = np.nansum(profile*submask*subdata/variance,

and every other 10-15 times, it does seem to extract a spectrum (curiously, only the odd ones seem to be extracted):

 99%|██████████████████████████████████████████████████████████████████████████████████████▍| 671/675 [00:52<00:00, 16.82it/s]

Is this normal behavior?

Note: I did not adapt ywindow to [0,32], at an earlier stage I did this but got error messages. Following your earlier advice (https://github.com/kevin218/Eureka/issues/667#issuecomment-2198375045), I adapted it to the values where there actually is an image in the calints.fits file, namely [1,31] for nrs1 and [1,27] for nrs2 (for this particular object). I will examine the science outputs on Monday and, for the record, will give feedback in this issue.

kevin218 / Eureka