Open MycoMap opened 2 years ago
Yeah -- there is some basic/rudimentary processing in the codebase already -- so demultiplexing the reads is easy and already done. The most difficult part will be dealing with the error rates and de novo clustering, ie with ONT simplex reads at say 96% accurate -- you have a hard time splitting those raw data into appropriate bins to create OTUs. I tried several things awhile ago and wasn't too happy with the data, although I had old ONT data so that is also part of the issue. Newer data, ie from the LSK112/R10.4 setup is much better single read accuracy. I also tried some clustering with isONClust that works sort of okay (its built for clustering transcript data), but its still quite difficult to sort through the noisy reads and get reliable clusters/OTUs. Perhaps generating duplex reads would have high enough accuracy where you could just cluster with something like uclust/vsearch.
What sequencing kits did you run these with? If you know what should be in these samples (ie high quality Sanger data for every specimen)then yes would help immensely in trying to figure out a de novo approach.
PeterKennedy showed those early ONT results at MSA22 that you had worked on @nextgenusfs - def limited value for the older amplicons, but we also discussed with others doing PacBio on amplicons getting really good results. I think newer data def worth a look.
Also Ryan Wick's twitter post on doing short-read assembly with ONT also demonstrates how the accuracy is improved in R10.4 https://twitter.com/rrwick/status/1548926644085108738
It might be worth looking at a different clustering approach since error model for usearch/vsearch might not be able to really model the ONT error as well.
I have some sets of dual indexed fungal ITS amplicon pools from specimens. One is 288 specimens and the other is 480 specimens if they would be helpful at all in developing methods.
Could also discuss my current workflow that seems to work reasonably well. steve@hoosiermushrooms.org