Optimizing runtime on HPC

sadams2013 commented 9 months ago

Hello - thank you for your work on DeepConsensus. I'm looking for a bit of guidance on how to best proceed with runtime optimization on a HPC cluster with GPU.

As expected from the documentation, I find a dramatic improvement in CPU runtime if I run the analysis in parallel with 500 chunks. However, I'm struggling with the 'best' way to utilize GPUs for this analysis. I have a fixed number of GPUs (8) that I can use.

Is the best approach in this case to use 8 bigger chunks with 1 chunk per GPU, or is there a benefit from using something like MPS (https://docs.nvidia.com/deploy/mps/index.html) to submit multiple smaller chunks to 1 GPU.

Thank you!

danielecook commented 9 months ago

@sadams2013 see our runtime metrics. Unfortunately, GPUs don't provide much of an advantage in terms of runtime. I think you are better off sticking with CPUs in this case.

There is a flag called --skip_windows_above that can be used to speed things up. See yield metrics for details.

sadams2013 commented 9 months ago

Thank you for getting back to me - that is very helpful.

google / deepconsensus

Optimizing runtime on HPC #75