Closed maruiqi0710 closed 1 year ago
Vecscreen removes primer and obvious contam (phiX) and trims Or splits contigs using univec db.
Sourpurge screens scaffolds for contamination by looking for matches to groups outside the phylum. It removes scaffolds that are likely matching to a contamination source.
I have also just written in support for NCBI fcs-Gx tool which does similar thing to sourmash/sourpurge step.
In addition to vecscreen and sourpurge, the filter function is also involved in removing gene fragments of non-target species. The filter step removes reads before the assemble step, and also removes reads of mitochondrial genes from the mito step. Therefore, in AAFTF, there are three steps involved in removing gene fragments of non-target species.
Did I summarize it correctly?
yes
someday soon I'll finish writing up a manuscript on the tool to give this more description.
the ncbi fcs-adapator and fcs-gx are being tested but gx is a little unwieldy as it needs a lot of memory and large DB while sourpurge is pretty small footprint (though the version of the DB I had been using before was removed from OSF.io so the default install may not work right now till we see if the replacement larger DB can still work efficiently).
fcs-gx support is now in AAFTF - available in live code and will be part of the v0.5.0 release
0.5.0 has these features. closing this query since answered above.
Do they all aim to remove contaminants from contigs (i.e., eliminate contigs that do not belong to the target species)?