google / deepconsensus

DeepConsensus uses gap-aware sequence transformers to correct errors in Pacific Biosciences (PacBio) Circular Consensus Sequencing (CCS) data.
BSD 3-Clause "New" or "Revised" License
229 stars 36 forks source link

Optimizing runtime on HPC #75

Closed sadams2013 closed 9 months ago

sadams2013 commented 9 months ago

Hello - thank you for your work on DeepConsensus. I'm looking for a bit of guidance on how to best proceed with runtime optimization on a HPC cluster with GPU.

As expected from the documentation, I find a dramatic improvement in CPU runtime if I run the analysis in parallel with 500 chunks. However, I'm struggling with the 'best' way to utilize GPUs for this analysis. I have a fixed number of GPUs (8) that I can use.

Is the best approach in this case to use 8 bigger chunks with 1 chunk per GPU, or is there a benefit from using something like MPS (https://docs.nvidia.com/deploy/mps/index.html) to submit multiple smaller chunks to 1 GPU.

Thank you!

danielecook commented 9 months ago

@sadams2013 see our runtime metrics. Unfortunately, GPUs don't provide much of an advantage in terms of runtime. I think you are better off sticking with CPUs in this case.

There is a flag called --skip_windows_above that can be used to speed things up. See yield metrics for details.

sadams2013 commented 9 months ago

Thank you for getting back to me - that is very helpful.