General performance question

chenxf611 commented 2 years ago

Hi,

This is a great tool and we really want to put it into our production. However, after I tried out a small-scale run, the main issue was time-consuming to run deepconsensus itself (the last step). It took 616 seconds to run only 1000 ZMWs. I used one V100 GPU with 32G memory and 16 CPUs on our HPC system, I don't know how much time is needed to finish one regular SMRTCell and generate HiFi reads, in general, 8M ZMWs will take about 2 months to get the results.

I would like to get your input on the possibility to speed up the last step, more GPUs? more CPUs? or adjust the batch size?

Thanks

Jack

I0216 11:21:42.449043 140621518309184 quick_inference.py:492] Processed a batch of 100 ZMWs in 58.32757616043091 seconds I0216 11:21:42.460514 140621518309184 quick_inference.py:570] Processed 100 ZMWs in 62.885846 seconds I0216 11:22:43.300720 140621518309184 quick_inference.py:492] Processed a batch of 100 ZMWs in 56.282196283340454 seconds I0216 11:22:43.311169 140621518309184 quick_inference.py:570] Processed 200 ZMWs in 123.736485 seconds I0216 11:23:44.079205 140621518309184 quick_inference.py:492] Processed a batch of 100 ZMWs in 56.101356506347656 seconds I0216 11:23:44.090897 140621518309184 quick_inference.py:570] Processed 300 ZMWs in 184.516218 seconds I0216 11:24:41.855663 140621518309184 quick_inference.py:492] Processed a batch of 100 ZMWs in 53.60238575935364 seconds I0216 11:24:41.864543 140621518309184 quick_inference.py:570] Processed 400 ZMWs in 242.289871 seconds I0216 11:25:45.063106 140621518309184 quick_inference.py:492] Processed a batch of 100 ZMWs in 58.29405975341797 seconds I0216 11:25:45.074118 140621518309184 quick_inference.py:570] Processed 500 ZMWs in 305.499436 seconds I0216 11:26:47.627478 140621518309184 quick_inference.py:492] Processed a batch of 100 ZMWs in 57.61990475654602 seconds I0216 11:26:47.638337 140621518309184 quick_inference.py:570] Processed 600 ZMWs in 368.063655 seconds I0216 11:27:50.719364 140621518309184 quick_inference.py:492] Processed a batch of 100 ZMWs in 58.07153558731079 seconds I0216 11:27:50.729454 140621518309184 quick_inference.py:570] Processed 700 ZMWs in 431.154773 seconds I0216 11:28:52.719674 140621518309184 quick_inference.py:492] Processed a batch of 100 ZMWs in 57.31240630149841 seconds I0216 11:28:52.730889 140621518309184 quick_inference.py:570] Processed 800 ZMWs in 493.156202 seconds I0216 11:29:55.011658 140621518309184 quick_inference.py:492] Processed a batch of 100 ZMWs in 57.378705739974976 seconds I0216 11:29:55.021590 140621518309184 quick_inference.py:570] Processed 900 ZMWs in 555.446918 seconds I0216 11:30:55.729195 140621518309184 quick_inference.py:492] Processed a batch of 100 ZMWs in 56.30019426345825 seconds I0216 11:30:55.740186 140621518309184 quick_inference.py:570] Processed 1000 ZMWs in 616.165509 seconds

AndrewCarroll commented 2 years ago

Hi @chenxf611

In our benchmarks with a 16-core CPU instance with a P100 GPU, we observe a rate of 2.93 ZMW/second for 11kb HiFi read length reads. This isn't far from your 1.622 ZMW/second, and the difference can potentially be explained by a different GPU and/or different read lengths.

So it does not look like you are doing anything out of the ordinary which is making your results slower. When we run DeepConsensus, we horizontally scale across multiple machines.

I understand that the amount of compute time required is the largest pain point for users to run DeepConsensus. We are working on further improvements that we hope will substantially reduce compute for further versions. From DeepConsensus v0.1 to v0.2 there is a ~10x speed improvement, and we are targeting similar magnitudes of improvement for future releases.

kishwarshafin commented 2 years ago

Hi @chenxf611 ,

Please see this documentation available with v0.3 release that answers your questions about batch size vs performance: https://github.com/google/deepconsensus/blob/r0.3/docs/yield_metrics.md

With v0.3 you can also adjust the --skip_windows_above parameter for faster processing and pre-processing with --min-rq 0.88 halves the amount of ZMWs needs to be processed. Hope this helps.

pichuan commented 2 years ago

Hi @chenxf611 , I'll close this issue now. But feel free to reopen (or open a new one) if you have more quesions.

google / deepconsensus

General performance question #23