ldolan05 / ACID

Repository for ACID pipeline
https://acid-code.readthedocs.io/en/latest/index.html
MIT License
1 stars 0 forks source link

ACID using up all available memory #21

Open tzdwi opened 3 months ago

tzdwi commented 3 months ago

Hi there!

I saw your paper on arXiv, and decided to try ACID for a project I'm working on. I have 145 echelle spectra of the same object with 71 orders each. While I was able to get through a successful ACID run on one frame/one order going on my laptop (2020 MacBook Pro, intel chip), given the volume of data, I'm attempting to migrate to my work desktop (2022 Mac Studio, M1 pro) and am running into the following issue regardless of whether I try a naive double-for loop over frames/orders, or just a loop over orders as suggested in the "Multiple Wavelength Ranges" section of the docs.

When I run my script, I get the "Initialising..." and "Fitting the Continuum..." messages, but soon after, I get 20 (the number of cores on my machine) more "Initialising..." messages before the machine crashes due to running out of available memory (64 GB). I'm guessing this has to do with the Pool object taking up too many resources -- confirmed by the Traceback when I use Ctrl-C to stop the run -- but otherwise am unsure of why this is happening on one machine and not the other. Other than the chipset, the only difference between the two is that the Mac Studio has python 3.8 (earliest version that'll work on the Apple chip).

Any help or guidance you can give would be much appreciated!

ldolan05 commented 3 months ago

Hi Trevor,

Sorry it's taken me a while to reply, I am only getting around to seeing your email now! It's great to hear from someone that's using ACID so hopefully we can get it working for you! I think the issue could be coming from the use of ACID with different Python packages. ACID is currently only compatible with Python 3.7 due to a bug in the multiprocessing package included in newer versions of Python. I would recommend using a conda environment with Python 3.7 as this is what ACID is developed and tested in so it should work on both systems.

Hopefully this helps!

Thanks, Lucy

On Mon, Mar 25, 2024 at 7:31 PM Trevor Dorn-Wallenstein < @.***> wrote:

Hi there!

I saw your paper on arXiv, and decided to try ACID for a project I'm working on. I have 145 echelle spectra of the same object with 71 orders each. While I was able to get through a successful ACID run on one frame/one order going on my laptop (2020 MacBook Pro, intel chip), given the volume of data, I'm attempting to migrate to my work desktop (2022 Mac Studio, M1 pro) and am running into the following issue regardless of whether I try a naive double-for loop over frames/orders, or just a loop over orders as suggested in the "Multiple Wavelength Ranges" section of the docs.

When I run my script, I get the "Initialising..." and "Fitting the Continuum..." messages, but soon after, I get 20 (the number of cores on my machine) more "Initialising..." messages before the machine crashes due to running out of available memory (64 GB). I'm guessing this has to do with the Pool object taking up too many resources -- confirmed by the Traceback when I use Ctrl-C to stop the run -- but otherwise am unsure of why this is happening on one machine and not the other. Other than the chipset, the only difference between the two is that the Mac Studio has python 3.8 (earliest version that'll work on the Apple chip).

Any help or guidance you can give would be much appreciated!

— Reply to this email directly, view it on GitHub https://github.com/ldolan05/ACID/issues/21, or unsubscribe https://github.com/notifications/unsubscribe-auth/AONLBPMPZP2LACXTEM66O3DY2B3QTAVCNFSM6AAAAABFHTKTNWVHI2DSMVQWIX3LMV43ASLTON2WKOZSGIYDMNJTGI2TENY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

tzdwi commented 3 months ago

Hi Lucy,

Thanks for getting back to me! Unfortunately I recently tried a longer run on my laptop in a Python 3.7 environment. No multiprocessing bugs, but I still ran out of available memory. Do you have any intuition on what reasonable limits might be for number of wavelength points per chunk/number of frames/number of velocity bins?

ldolan05 commented 2 months ago

Hi Trevor,

We haven't done enough testing yet to have a great idea of limits but our test set had 72 orders with around 5000 wavelength pixels in each. We ran this for 114 frames with around 100 velocity bins (this was subject to change but I think this was one of the largest number of velocity pixels we used). Hopefully that can give you something to compare your data to so see what's causing the issue!

Thanks, Lucy

On Fri, 12 Apr 2024, 18:44 Trevor Dorn-Wallenstein, < @.***> wrote:

Hi Lucy,

Thanks for getting back to me! Unfortunately I recently tried a longer run on my laptop in a Python 3.7 environment. No multiprocessing bugs, but I still ran out of available memory. Do you have any intuition on what reasonable limits might be for number of wavelength points per chunk/number of frames/number of velocity bins?

— Reply to this email directly, view it on GitHub https://github.com/ldolan05/ACID/issues/21#issuecomment-2052202831, or unsubscribe https://github.com/notifications/unsubscribe-auth/AONLBPOYPP4UY4PHRBXLMALY5AMPVAVCNFSM6AAAAABFHTKTNWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANJSGIYDEOBTGE . You are receiving this because you commented.Message ID: @.***>

tzdwi commented 2 months ago

Oh really? That's essentially the order of magnitude of what I'm working with; maybe twice the number of velocity bins, but otherwise similar numbers of orders and frames.

wangxianyu7 commented 1 week ago

ACID requires a substantial amount of memory, as it creates large-scale matrices during LSD operations. Therefore, I recommend using it on an HPC system with more than 100GB of memory.

tzdwi commented 1 week ago

Hi Xian-Yu,

That's good to know. Is/can that recommendation be found easily in the docs? That would have been handy for folks like me who like to run things without thinking it though :)

This project has been moved to the back burner, but I'll report back once I get a chance to run on my local HPC. Feel free to close the issue if y'all would like, or leave it open just in case.

wangxianyu7 commented 1 week ago

Hi Trevor,

The document doesn't include this recommendation. I tried running the code on my laptop and encountered the same issue with it using up all available memory. I traced the problem to this line here, where a large matrix (~5GB) is being handled. I then realized that this code likely can't run on my laptop. When I monitored the memory usage on an HPC, it sometimes used around 35GB. So I suggest running it on a large-memory HPC.

The code is running well on my side. If you encounter any problems, feel free to let me know. :)