Open cpavloud opened 3 years ago
Ok, I had a look on those 2 and i think we should go for metaMATE
From its repo:
as mentioed here -- "The ideal dataset for metaMATE is a set of ASVs arising from a multi-sample metabarcoding dataset accompanied by a solid set of reference sequences that are expected to be present in the dataset. Optionally, metaMATE can also utilise data assigning each ASV to a taxonomic group."
In addition, " metaMATE currently cannot process more than 65,536 input ASVs if perfoming clade binning due to the exponential complexity of the UPGMA algorithm."
@cpavloud do you think we have to integrate this for the ARMS project?
We could but I don't know how much time it would require.
@hariszaf Explain why it's useful.
Could be very informative especially in the case of COI samples (in general, useful for protein coding markers). A tool that could be added are metaMATE (https://github.com/tjcreedy/metamate). Also, this publication (https://doi.org/10.1186/s12859-021-04180-x) discusses how to remove putative pseudogenes and the method (based on the NCBI ORFfinder program) is implemented in MetaWorks (https://github.com/terrimporter/MetaWorks).