Open asylvz opened 2 months ago
Hi, if you can share a few example input datasets, I think I may be able to give you some suggestions in terms of parameters.
Actually this is not for a specific scenario; I'll use it in my algorithm and currently testing it with ONT data of some samples (reads can be retrieved from the crams here: https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1KG_ONT_VIENNA/hg38/).
Basically it should be fast enough for 20-30K long ONT reads. I'm currently using wtdbg2 for this.
I'm also sending a sample cluster of reads. This is one of the large clusters (25 reads), so not all of them are that large. H2-s218243_1350.fasta.zip
I am not sure the scenario you specifically refer to. Since you mentioned wtdbg2, if you need a consensus sequence after the assembly step, I think wtdbg2 has its own poa consensus calling module. For abPOA, it generally takes reads with unified boundaries and perform end-to-end global alignment, and then generate a consensus sequence based on the alignment result.
I actually want to generate a consensus but since the poa algorithms are slower, I had to use wtdbg2. Your algorithm seems to be much faster, so I wanted to test it. For the ONT reads of 20-30K, which w, k, min-w, etc. would you suggest?
For your data H2-s218243_1350.fasta.zip, I see the read lengths varies a lot and they are not from the same strand.
Since I don't know how you obtained this cluster of reads (based on mapping position?), I can only suggest you run abpoa -Ss in.fasta > cons.fa
and see how the consensus sequence meets your expection.
Hi,
I'm trying to use the library to generate consensus of ONT reads for multiple clusters of reads. Each cluster has around 10 - 30 reads. However, I'm not sure which parameters to use for minimizer-based seeding and partitioning in order to balance the accuracy and speed.
I'll be happy if you can suggest me a set of parameters to optimize for speed, memory and accuracy.
Thank you, Arda