Open rrydbirk opened 1 year ago
Hello, I tested VeloVAE on cpu(Intel Xeon Gold 6154, 4 nodes, 32 cores per node), spgpu(Nvidia A40) and gpu (Nvidia V100). Using GPUs should give you a 3-5x speed up. For example, for the pancreas dataset shown in the example notebook, CPU training took about 23 minutes, while for both spgpu and gpu training took about 5-6 minutes. The difference is quite clear even without using a time profiler.
It seems you might have a cuda issue. Could you provide more details?
@g-yichen I'd be happy to provide more details, I'm just not sure what to provide :-)
You have my full pip freeze
above and my notebook snippets. There's no warning about "GPU not found" which occurs on a non-GPU node. Using nvidia-smi, I can see GPU usage bounce up and down, but nothing overwhelming.
I'm running the same data side-by-side on 32 CPU node and a 12 CPU / 1 A100 GPU node. It seems the GPU node is ~1 s/it slower than the CPU node. Could you advise me on what I'm doing wrong?
For the GPU node:
I'm running: