Open simondrue opened 2 months ago
Hi @simondrue - your benchmark showing that 5mCG_5hmCG@v1 + 6mA@v2 is faster than 6mA@v2 only is a bit surprising - could you repeat this a few times and verify that your benchmarks are not noisy?
I created a small dataset with four pod5s that are about 2GB each and run on 4xA100 with 4.3.0 sup model.
5mCG_5hmCG only: 3m49.244s 5mC_5hmC only: 6m21.693s 6mA only: 9m33.237s 5mCG_5hmCG + 6mA: 9m49.589s 5mC_5hmC + 6mA: 11m13,113s
My times seem quite normal. Is this within expectation?
Hi,
I expanded my benchmark and used Dorado v0.7.0 with the new v5 models for both HAC and SUP, all available modifications (one at a time - no combinations) and 5 replicates with --max-reads 150000
. The system is the same as stated above and the data is from a cfDNA sample.
I still see the significant slowdown for 6mA model, even compared to the other all context models. Just to verify that there is not an enriched amount of A the composition of the sample is:
The data behind the plots: speed_data.csv
Sorry for the late reply
/Simon
Hi Nanopore team,
I noticed a significant (~10x) slow down of the Dorado basecaller when adding the 6mA modification model. This is notable since adding the 5mCG_5hmCG modification model have almost no impact on basecalling speed.
Is this expected behavior? If so, are there any plans to optimize the speed of the 6mA model?
Results from a small benchmark of basecalling speeds:
Thanks for a great tool. Looking forward to see where the project is going 🚀
Run environment:
Logs
5mCG_5hmCG@v1 + 6mA@v2
5mCG_5hmCG@v1 only
6mA@v2 only
6mA@v1 only
No mods