analyze_landscape_full runs much slower than expected

cbeck22 commented 1 month ago

Hi!

I've run analyze_landscape and discovered some interesting compositional heterogeneity that cryoSPARC couldn't detect! I want to further explore this with analyze_landscape_full. The documentation says that mapping a dataset of ~100,000 particles should take ~4 hours on a single GPU; however, the job is running much slower than expected. The job is currently in the generating 10000 volume embeddings step, and each batch of 100 volume embeddings is taking ~20-30 minutes. Extrapolating to the full 10,000 volume embeddings that need to be generated, this step will take >2 days to finish. Is there anything I should be doing to make the job run faster?

Cheers, cbeck

I've included the outputs of the log, htop and nvidia-smi below:

Command cryodrgn analyze_landscape_full . 99 > landscape_full.log This was run in the results directory with the weights and z-value .pkl files, as well as the analyze.99 directory.

Log file

(INFO) (analyze_landscape_full.py) (23-Sep-24 12:37:28) Generating 10000 volume embeddings
(INFO) (analyze_landscape_full.py) (23-Sep-24 12:37:28) 0
(INFO) (analyze_landscape_full.py) (23-Sep-24 13:10:06) 100
(INFO) (analyze_landscape_full.py) (23-Sep-24 13:37:24) 200
(INFO) (analyze_landscape_full.py) (23-Sep-24 14:01:57) 300
(INFO) (analyze_landscape_full.py) (23-Sep-24 14:26:05) 400
(INFO) (analyze_landscape_full.py) (23-Sep-24 14:54:45) 500
(INFO) (analyze_landscape_full.py) (23-Sep-24 15:24:43) 600
(INFO) (analyze_landscape_full.py) (23-Sep-24 15:55:08) 700
(INFO) (analyze_landscape_full.py) (23-Sep-24 16:27:13) 800
(INFO) (analyze_landscape_full.py) (23-Sep-24 16:58:30) 900
(INFO) (analyze_landscape_full.py) (23-Sep-24 17:29:22) 1000
(INFO) (analyze_landscape_full.py) (23-Sep-24 18:00:04) 1100

Output of htop

Output of nvidia-smi

Mon Sep 23 12:53:11 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA RTX A5000               Off | 00000000:17:00.0 Off |                  Off |
| 30%   24C    P8               6W / 230W |    829MiB / 24564MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA RTX A5000               Off | 00000000:31:00.0 Off |                  Off |
| 30%   24C    P8               7W / 230W |      4MiB / 24564MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA RTX A5000               Off | 00000000:B1:00.0 Off |                  Off |
| 30%   24C    P8               8W / 230W |      4MiB / 24564MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA RTX A5000               Off | 00000000:CA:00.0 Off |                  Off |
| 30%   23C    P8               6W / 230W |      4MiB / 24564MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A   1427111      C   ...cryodrgn/3.3.3/miniconda/bin/python      822MiB |
+---------------------------------------------------------------------------------------+

michal-g commented 1 month ago

Hi, the simplest thing to do would be to reduce the size of the model used in this analysis; --dim and --layers are set to 512 and 3 respectively by default, but you can try a set of values like (256, 2).

There's some other things that could be happening on your side — such as differences in the hardware and drivers used — which could make things slower, but at first glance I don't see anything above that would suggest such a drastic slowdown. I will try some fresh benchmarking runs on my side and get back to you!

cbeck22 commented 1 month ago

Thank you, Michal! I'll give this a try and report back. I used a 2048x3 encoder and decoder to train cryoDRGN on these particles - does this have any effect on the recommended number of dimensions and layers to use for analyze_landscape_full?

cbeck22 commented 1 month ago

Quick update - reducing the size of the model to 256x2 hasn't had an effect on the speed of the volume generation. It still takes 30 minutes per batch of 100 volumes.

michal-g commented 1 month ago

Hey, sorry for the oversight, but the runtime in the documentation refers to running with the same number of training volumes as the number used for sketching in analyze_landscape, that is, with cryodrgn analyze_landscape_full -N 1000 .... We've updated the docs to clear this up!

zhonge commented 1 month ago

Hello @cbeck22, thanks for reporting this, and super cool to hear that you found interesting structures! Turns out we do have a bug where cryodrgn analyze_lanscape_full is generating volumes on the CPU instead of the GPU. We'll be fixing this asap!

For what it's worth, cryodrgn analyze_landscape generates 500 volumes for analysis and cryodrgn analyze_landscape_full generates 10k volumes (on the fly, not saved to disk). Volume generation is the slow step, so analyze_landscape_full should take roughly 20x how long cryodrgn analyze_landscape takes.

By my vague recollection, analyze_landscape takes about 20min, so you could expect analyze_landscape_full to take around 6h.

michal-g commented 1 week ago

For #419 we have ensured landscape analysis generates volumes on GPU when available: https://github.com/ml-struct-bio/cryodrgn/blob/196365d3e396688c2aa03d8e1b68cd9de6e517ec/cryodrgn/commands/analyze_landscape_full.py#L219-L220

testing analyze_landscape_full confirms that the default of 10k volumes now takes 2.5 hours on a Tesla V100 GPU. I've updated the docs accordingly as well!

ml-struct-bio / cryodrgn

analyze_landscape_full runs much slower than expected #405