Closed Mon3trK closed 6 months ago
Please attach the following files from the assembly directory:
AssemblySummary.html
stdout.log
Binned-ReadLengthHistogram.csv
LowHashBucketHistogram.csv
DisjointSetsHistogram.csv
Assembly-BothStrands-NoSequence.gfa
You can omit the last one if it is too big and it is not practical to upload it. With this I should be able to get some idea of what may be going on.
@paoloshasta Thanks for the quick response, I prefered this to be private, is there an email which I can send the requested files to you ?
Sure, you can use the e-mail address in my GitHub profile.
Hi @paoloshasta, I have sent the files through email. If you have any further question please let me know
I received your files and I am looking at them. However, when I try to reply to your e-mail, I get in return an unusual error message from your Outlook server. Please contact me again using a different e-mail address so I can reply. You don't need to resend the files - I have them.
@paoloshasta Thank you for you help, I have send you an email using a different email address.
Thanks to @paoloshasta I successfully solved the problem. For ONT R10
data, to achieve the best performance, one should use Guppy5/6
for basecalling under the sup
basecalling mode if you choose the --config Nanopore-R10-Fast-Nov2022
. Baecalling with hac
or Dorado
are suboptimal for this configuration.
Hi @Mon3trK , for what it's worth that is not generally true for our datasets, and we have done >20 plant genomes.
We use dorado
and sup basecalling as standard, and --config Nanopore-R10-Fast-Nov2022
generally yields best results. If we see poor results, which is rare these days, then we go back to the Nanopore-May2022
config.
I believe guppy is deprecated in favour of dorado so it won't be a good choice for the future.
Hi @colindaven thank you for the heads up.
Dear @paoloshasta , Hi, recently I am assembling a human genome sequenced by R10.4.1 flowcell and basecalled with Dorado (HAC mode). The N50 of the input read is not bad for ~32.0K and the sequencing depth is also decent (~40-50X). I use the command
to conduct the denovo assembly. But I only got a NG50 of ~3 Mb and total length of 2.7 Gb. The performance of shasta on this data is unusual I think, do you hae any ideas to improve this assembly? Many thanks