npirzkal / NIRCAM_Gsim

MIT License
0 stars 3 forks source link

OSError: [Errno 12] Cannot allocate memory #6

Open eas342 opened 3 years ago

eas342 commented 3 years ago

I tried running the latest NIRCAM_Gsim today using @bhilbert4's latest mirage and run into a memory problem. I then did a fresh install of mirage in a new environment with python 3.7 and its dependencies with NIRCAM_Gsim install both with the PyPI and the latest development version. The memory problem recurs in both of my environments. It got to using 86 GB for a fairly simple exposure (2048x64, 19 groups, 5 ints) that has succeed before. Have you seen a memory explosion or leak before like this?

Sorry this is a very long file. I can try making a minimal working example. I think we might need a simpler TSO mirage script for creating minimal working examples.

NIRCam_ERS_simple_5integrations.py.txt

``` 2021-03-17 22:28:43,047 - stpipe - INFO - Dispersing TSO source 2021-03-17 22:28:43,048 - mirage.grism_tso_simulator - INFO - Retrieving grism-related config files from: /fenrirdata1/program_data/mirage_data/nircam/GRISM_NIRCAM/current/ 2021-03-17 22:28:43,048 - stpipe - INFO - Retrieving grism-related config files from: /fenrirdata1/program_data/mirage_data/nircam/GRISM_NIRCAM/current/ 2021-03-17 22:28:43,048 - stpipe - INFO - Retrieving grism-related config files from: /fenrirdata1/program_data/mirage_data/nircam/GRISM_NIRCAM/current/ Using POM mask /fenrirdata1/program_data/mirage_data/nircam/GRISM_NIRCAM/current/NIRCAM_LW_POM_ModA_trans.fits Loading extra optical element transmission curves from POM file Warning: SED Extrapolation turned on. We have 1 Objects dir image: /fenrirdata1/es_tso/sim_data/mirage_tso_notebook/sim_data_try_011_attribute_error/jw00042001001_01101_00002_nrca5_uncal_tso_grism_sources_F322W2_GRISMR_final_seed_image.fits Using POM mask /fenrirdata1/program_data/mirage_data/nircam/GRISM_NIRCAM/current/NIRCAM_LW_POM_ModA_trans.fits Loading extra optical element transmission curves from POM file Warning: SED Extrapolation turned on. We have 1 Objects dir image: /fenrirdata1/es_tso/sim_data/mirage_tso_notebook/sim_data_try_011_attribute_error/jw00042001001_01101_00002_nrca5_uncal_tso_grism_sources_F322W2_GRISMR_final_seed_image.fits Object caching ON 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [13:00<00:00, 780.27s/it] Object caching ON 0%| | 0/1 [00:04 gr_f322w2.create() File "/home/schlawin/miniconda3/envs/miragePy3p7/lib/python3.7/site-packages/mirage/grism_tso_simulator.py", line 381, in create add_background=False, cache=True, finalize=True) File "/home/schlawin/miniconda3/envs/miragePy3p7/lib/python3.7/site-packages/mirage/grism_tso_simulator.py", line 950, in run_disperser disp_seed.this_one[order].disperse_all(cache=True) File "/home/schlawin/miniconda3/envs/miragePy3p7/lib/python3.7/site-packages/NIRCAM_Gsim/observations/observations.py", line 297, in disperse_all this_object = self.disperse_chunk(i) File "/home/schlawin/miniconda3/envs/miragePy3p7/lib/python3.7/site-packages/NIRCAM_Gsim/observations/observations.py", line 497, in disperse_chunk mypool = Pool(self.max_cpu) # Create pool File "/home/schlawin/miniconda3/envs/miragePy3p7/lib/python3.7/multiprocessing/context.py", line 119, in Pool context=self.get_context()) File "/home/schlawin/miniconda3/envs/miragePy3p7/lib/python3.7/multiprocessing/pool.py", line 176, in __init__ self._repopulate_pool() File "/home/schlawin/miniconda3/envs/miragePy3p7/lib/python3.7/multiprocessing/pool.py", line 241, in _repopulate_pool w.start() File "/home/schlawin/miniconda3/envs/miragePy3p7/lib/python3.7/multiprocessing/process.py", line 112, in start self._popen = self._Popen(self) File "/home/schlawin/miniconda3/envs/miragePy3p7/lib/python3.7/multiprocessing/context.py", line 277, in _Popen return Popen(process_obj) File "/home/schlawin/miniconda3/envs/miragePy3p7/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__ self._launch(process_obj) File "/home/schlawin/miniconda3/envs/miragePy3p7/lib/python3.7/multiprocessing/popen_fork.py", line 70, in _launch self.pid = os.fork() OSError: [Errno 12] Cannot allocate memory ( ```
bhilbert4 commented 3 years ago

I've never seen that before, although I haven't run a full TSO simulation with the latest Mirage in a while. For the grismc sim I hacked into a slightly older version.

eas342 commented 3 years ago

OK, thanks. I re-ran the mirage TSO notebook. It did complete successfully but used up to 46 GB for an exposure with 5 groups, 70 ints and a 2048x256 stripe mode.

bhilbert4 commented 3 years ago

Nor brought up the idea of the resolution of your input spectra. He said that if your spectra are at very high resolution, this will greatly slow down processing time. I'm wondering if it will also have an effect on memory. He just recommended to someone else to use a resolution that is 2-4X that of the disperser.

I think this might argue for the addition of some kind of binning/smoothing within Mirage if the resolution is too high.

eas342 commented 3 years ago

Thanks for the tip @bhilbert4 . The stellar spectrum comes from sp = stsyn.grid_to_spec('ck04models', t_eff, metallicity, log_g) and I checked that the wavelengths and fluxes were only 1221 elements long. I also shrunk down the resolution of the transmission spectrum to 10 elements ie:

waves = np.linspace(0.9, 5.5, 10)  # microns
trans = np.repeat(params.rp, 10)  # R_planet / R_star   

just to be sure. I don't think this solves the problem because it got to 89 GB of memory before I killed it.

bhilbert4 commented 3 years ago

Yikes. That's ~2GB per full frame equivalent. @npirzkal also suggested turning off caching in the call to NIRCam_Gsim, but this will take an update to Mirage to do. And of course, it wasn't a problem before with caching turned on...

eas342 commented 3 years ago

Thanks for the idea! I hardcoded mirage.grism_tso_simulator.GrismTSO.run_disperser to disp_seed.this_one[order].disperse_all(cache=False) and that seems to keep the memory usage to 2.0 GB or less. It's running but not yet finished.

bhilbert4 commented 3 years ago

Interesting! Let me know how it goes. Without the caching I wonder if it will be much slower.

eas342 commented 3 years ago

Yes it looks like it will be too slow to run hour-long exposures. It took 30 minutes to disperse one object and I don't know how many times it will have to repeat that.

2021-03-18 14:10:06,002 - stpipe - INFO - Dispersing TSO source 2021-03-18 14:10:06,002 - mirage.grism_tso_simulator - INFO - Retrieving grism-related config files from: /fenrirdata1/program_data/mirage_data/nircam/GRISM_NIRCAM/current/ 2021-03-18 14:10:06,002 - stpipe - INFO - Retrieving grism-related config files from: /fenrirdata1/program_data/mirage_data/nircam/GRISM_NIRCAM/current/ 2021-03-18 14:10:06,002 - stpipe - INFO - Retrieving grism-related config files from: /fenrirdata1/program_data/mirage_data/nircam/GRISM_NIRCAM/current/ Using POM mask /fenrirdata1/program_data/mirage_data/nircam/GRISM_NIRCAM/current/NIRCAM_LW_POM_ModA_trans.fits Loading extra optical element transmission curves from POM file Warning: SED Extrapolation turned on. We have 1 Objects dir image: /fenrirdata1/es_tso/sim_data/mirage_tso_notebook/files_to_test_mirage/sim_data_try_011_attribute_error/jw00042001001_01101_00002_nrca5_uncal_tso_grism_sources_F322W2_GRISMR_final_seed_image.fits Using POM mask /fenrirdata1/program_data/mirage_data/nircam/GRISM_NIRCAM/current/NIRCAM_LW_POM_ModA_trans.fits Loading extra optical element transmission curves from POM file Warning: SED Extrapolation turned on. We have 1 Objects dir image: /fenrirdata1/es_tso/sim_data/mirage_tso_notebook/files_to_test_mirage/sim_data_try_011_attribute_error/jw00042001001_01101_00002_nrca5_uncal_tso_grism_sources_F322W2_GRISMR_final_seed_image.fits 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [30:06<00:00, 1806.23s/it
eas342 commented 3 years ago

It did finish OK but took over an hour for this 30 second exposure with 5 ints, 19 groups in RAPID mode and a 2048x64 subarray.

bhilbert4 commented 3 years ago

Ouch. We'll definitely have to keep looking into this. We found one bug this afternoon that may have been slowing things down, but it only affected NIRISS.

eas342 commented 3 years ago

I made a self-contained example for debugging this and hopefully to help with (https://github.com/spacetelescope/mirage/issues/648)

from NIRCAM_Gsim.grism_seed_disperser import Grism_seed
import os

direct_file = 'jw88888001001_01101_00002_nrca5_uncal_tso_grism_sources_F444W_GRISMR_ptsrc_seed_image.fits'
SED_file = 'test_grism_tso_sed_file_wasp43.hdf5'

instrument='NIRCAM'
crossing_filter='F444W'

orders = ["+1"] ## just first order to save time
dmode = 'modA_R'
data_loc = os.path.join(os.environ['MIRAGE_DATA'],'nircam','GRISM_NIRCAM','current')

disp_seed = Grism_seed([direct_file],crossing_filter,dmode,
                       config_path=data_loc,instrument=instrument,
                       SED_file=SED_file)
disp_seed.observation(orders=orders)
tmp = disp_seed.disperse()

I uploaded the input files here: https://arizona.box.com/s/4gljqnoekq78gjloffx1b0zl69ae1r2i