spacetelescope / jwst

Python library for science observations from the James Webb Space Telescope
https://jwst-pipeline.readthedocs.io/en/latest/
Other
569 stars 167 forks source link

calwebb_image3 killed with having sufficient available RAM #7603

Open bangzhengsun1997 opened 1 year ago

bangzhengsun1997 commented 1 year ago

Hello!

I encountered a problem when running large-area drizzle, and the program kept being killed, likely due to an Out-of-memory killer.

I use Ubuntu 22.04 LTS, and my server machine has 1TB of physical RAM. When making large mosaics and my RAM is not enough to do in-memory outlier detection, I simply let it run outside RAM by writing resampled images to the disk for outlier_detection.

In this case, I believe calwebb_image3 uses the "cached memory" to some extent. Now my calwebb_image3 process uses ~350GB of physical RAM (marked as "used"), so the available RAM is ~650GB, but among them there's a ~640GB cached RAM, leaving only ~ 10GB free RAM.

Depending on how lucky I am, once the amount of free RAM touches 0, the process is killed even though I have huge RAM available but cached. If I drop cached memory regularly, then the outlier_detection would never finish - that's why I think it uses the cache memory somehow.

Please let me know if anyone has encountered the same problem, or has already found a solution to it. I doubt whether it's a problem with Ubuntu's newer kernel. Thanks in advance!

braingram commented 1 year ago

Thanks for opening an issue.

I'm not all that familiar with that part of the pipeline but do have a few questions to get started.

What versions of software are you using? Specifically the jwst and stdatamodels versions would be helpful. It might be easiest to share the entire output of pip freeze.

It appears that calwebb_image3 uses an association file as an input. Would it be possible to share that file? I'm hoping to get a sense of the size of the data going into the step (how many files and how big are the files).

EvgeniaKouts commented 1 week ago

Hello! I have a 32GB RAM and can get up to stage2 but then stage 3 kills everything. Is it impossible to run the whole pipeline with that memory?