Open cbeck22 opened 2 months ago
Yup, everything else being equal, ab-initio reconstruction uses more memory than reconstruction with fixed poses. Can you also try export PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.6,max_split_size_mb:128
to see if that helps?
However, with a relatively small amount of memory (~10GB per GPU) you may be stuck running smaller models. The default for abinit_het
is 256x3, so can you try something like 128x2? I would also be curious to see the output of the nvidia-smi
command on one or both of your workstations, as well as if you have any way of profiling the memory usage of the processes running on them.
Best, Michal
Hi Michal,
Thank you for your suggestions!
I tried using your export PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.6,max_split_size_mb:128
environment variable, but it doesn't seem to have helped - the job encounters the same out of memory error at the same pretrain iteration of 10000 (see log file below). The same happens when I use the 128x2 models for both the decoder and encoder: cryodrgn abinit_het particles_128_recentered/particles.128.txt --ctf ctf.pkl --zdim 8 --ind ind200k.pkl --enc-layers 2 --enc-dim 128 --dec-layers 2 --dec-dim 128 --multigpu -o abinitio/ > abinitio/abinito.log
Actually, now that I'm looking at the log file with fresh eyes, I noticed that the job runs into the out-of-memory error in what appears to be the final pretrain iteration. Is this the most memory intensive part of the job? In the meantime, I'm looking into using my institute's supercomputer cluster to run these ab initio jobs - their GPUs are much better than ours.
To answer your last question, I've been monitoring GPU memory usage by printing the results of nvidia-smi
to a log file, and just comparing the timestamps with the timestamps from the ab initio log file. The output of nvidia-smi
prints the memory usage of each of the two GPUs on this workstation (GeForce RTX 2080 Ti, 11.264 GB memory). It's not a very sophisticated method, so if you know of a better method, I'm all ears!
Output from the log file for this ab initio command: cryodrgn abinit_het particles_128_recentered/particles.128.txt --ctf ctf.pkl --zdim 8 --ind ind200k.pkl --multigpu -o abinitio/ > abinitio/abinito.log
(INFO) (abinit_het.py) (19-Sep-24 10:40:45) Using random poses for 10000 iterations
(INFO) (abinit_het.py) (19-Sep-24 10:41:15) [Pretrain Iteration 2000] loss=0.591484
(INFO) (abinit_het.py) (19-Sep-24 10:41:43) [Pretrain Iteration 4000] loss=0.578094
(INFO) (abinit_het.py) (19-Sep-24 10:42:10) [Pretrain Iteration 6000] loss=0.592761
(INFO) (abinit_het.py) (19-Sep-24 10:42:37) [Pretrain Iteration 8000] loss=0.590331
(INFO) (abinit_het.py) (19-Sep-24 10:43:03) [Pretrain Iteration 10000] loss=0.587424
tail: abinito.log: file truncated
At this point, I get the following error: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.35 GiB. GPU
Output from nvidia-smi --query-gpu=timestamp,memory.used --format=csv,nounits -l 1 > gpu_memory_log.csv
2024/09/19 10:42:50.045, 3251
2024/09/19 10:42:50.045, 18
2024/09/19 10:42:51.045, 3251
2024/09/19 10:42:51.045, 18
2024/09/19 10:42:52.045, 3251
2024/09/19 10:42:52.045, 18
2024/09/19 10:42:53.045, 3219
2024/09/19 10:42:53.046, 18
2024/09/19 10:42:54.046, 3251
2024/09/19 10:42:54.046, 18
2024/09/19 10:42:55.046, 3251
2024/09/19 10:42:55.046, 18
2024/09/19 10:42:56.046, 3219
2024/09/19 10:42:56.046, 18
2024/09/19 10:42:57.046, 3219
2024/09/19 10:42:57.046, 18
2024/09/19 10:42:58.047, 3219
2024/09/19 10:42:58.047, 18
2024/09/19 10:42:59.047, 3219
2024/09/19 10:42:59.047, 18
2024/09/19 10:43:00.047, 3219
2024/09/19 10:43:00.047, 18
2024/09/19 10:43:01.047, 3219
2024/09/19 10:43:01.047, 18
2024/09/19 10:43:02.047, 3219
2024/09/19 10:43:02.048, 18
2024/09/19 10:43:03.048, 3219
2024/09/19 10:43:03.048, 18
2024/09/19 10:43:04.048, 3219
2024/09/19 10:43:04.048, 18
2024/09/19 10:43:05.048, 9005
2024/09/19 10:43:05.048, 9306
2024/09/19 10:43:06.048, 9235
2024/09/19 10:43:06.049, 10418
2024/09/19 10:43:07.049, 827
2024/09/19 10:43:07.049, 15
2024/09/19 10:43:08.049, 827
2024/09/19 10:43:08.049, 15
It appears that the final pretrain iteration tries to request an additional 2.35 GiB beyond the 11.264 GB memory that the GPU actually has, and it requests the same 2.35 GiB of additional memory regardless of whether I use your PYTORCH_CUDA_ALLOC_CONF
environment variable or smaller model architecture.
I forgot to ask this in my first post, but is it expected that despite using the --multigpu
flag, cryoDRGN seems to only be using the memory of one of the two GPUs?
Cheers, cbeck
While I look into the other details mentioned above, can you also try using the minimal batch size with --multigpu
, that is, appending -b 1
to your commands, as this may also help resolve memory issues on a smaller workstation? I'd also like to see the output of just vanilla nvidia-smi
on your workstation, that is, without query and output flags!
-Mike
I appended -b 1
to the command, and it's running now! It finished the final pretrain iteration and is currently training epoch 1. I'll update you if the job finishes successfully. However, I'm getting the following warning incessantly:
/programs/x86_64-linux/cryodrgn/3.3.3/miniconda/lib/python3.9/site-packages/torch/nn/functional.py:4343: UserWarning: Default grid_sample and affine_grid behavior has changed to align_corners=False since 1.3.0. Please specify align_corners=True if the old behavior is desired. See the documentation of grid_sample for details.
warnings.warn(
Additionally, could I ask you to explain what the batch size means and its relationship to GPU memory?
And here's the output for nvidia-smi
, sorry for the misunderstanding!
Fri Sep 20 10:25:39 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01 Driver Version: 535.183.01 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 2080 Ti Off | 00000000:09:00.0 On | N/A |
| 0% 48C P2 79W / 250W | 1099MiB / 11264MiB | 25% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce RTX 2080 Ti Off | 00000000:0A:00.0 Off | N/A |
| 0% 38C P8 2W / 250W | 18MiB / 11264MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 2110 G /usr/lib/xorg/Xorg 102MiB |
| 0 N/A N/A 2627 G /usr/lib/xorg/Xorg 105MiB |
| 0 N/A N/A 6983 G /usr/lib/xorg/Xorg 175MiB |
| 0 N/A N/A 7115 G /usr/bin/gnome-shell 128MiB |
| 0 N/A N/A 8023 C ...cryodrgn/3.3.3/miniconda/bin/python 552MiB |
| 1 N/A N/A 2110 G /usr/lib/xorg/Xorg 4MiB |
| 1 N/A N/A 2627 G /usr/lib/xorg/Xorg 4MiB |
| 1 N/A N/A 6983 G /usr/lib/xorg/Xorg 4MiB |
+---------------------------------------------------------------------------------------+
On a different note, I saw that you're one of the authors on the new DRGN-AI biorxiv preprint that was released a couple months ago - would you recommend DRGN-AI over cryoDRGN's ab initio? Does DRGN-AI have the same GPU memory requirements?
Thank you! cbeck
Hello @cbeck22! If you're running into memory problems, I would make sure you're using a smaller decoder (e.g. 256x3) and decrease the batch size to -b 4
or -b 1
.
DRGN-AI is our latest version of ab initio reconstruction. It should be much better (and faster), so l would give it a shot. (FYI we benchmarked both in cryobench). Let us know if you run into any problems. We're working on incorporating DRGN-AI into the next major version of cryoDRGN, but for now it's a standalone piece of software.
Hi!
I'm trying to run
cryodrgn abinit_het
on 200K particles downsampled to 128, but the process quickly terminates with an error:torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.71 GiB. GPU
I get similar errors when I tried the following commands:
cryodrgn abinit_het particles_128_recentered/particles.128.txt --ctf ctf.pkl --zdim 8 --ind ind200k.pkl --lazy -o abinitio/ > abinitio/abinito.log
cryodrgn abinit_het particles_128_recentered/particles.128.txt --ctf ctf.pkl --zdim 8 --ind ind200k.pkl -o abinitio/ > abinitio/abinito.log
cryodrgn abinit_het particles_128_recentered/particles.128.txt --ctf ctf.pkl --zdim 8 --ind ind200k.pkl --multigpu -o abinitio/ > abinitio/abinito.log
I get the same errors when running these commands on a workstation with 4 GPUs (NVIDIA GeForce GTX 1080, 8192 MiB memory each) and one with 2 GPUs (NVIDIA GeForce GTX 2080, 11264 MiB each).
I've also tried
export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
to help PyTorch avoid fragmentation, and tried usingtorch.cuda.set_per_process_memory_fraction(0.9)
in a python shell to limit how much of the available memory that torch could use. However, neither approach worked.I was able to successfully use train_vae for an even larger number of particles. Does the ab initio job in particular require a significant amount of GPU memory? Is there any way around this?
Cheers, Curtis