stajichlab / AAFTF

Automatic Assembly For The Fungi
MIT License
19 stars 4 forks source link

Do vecscreen and sourpurge have similar functions? #22

Closed maruiqi0710 closed 1 year ago

maruiqi0710 commented 1 year ago

Do they all aim to remove contaminants from contigs (i.e., eliminate contigs that do not belong to the target species)?

hyphaltip commented 1 year ago

Vecscreen removes primer and obvious contam (phiX) and trims Or splits contigs using univec db.

Sourpurge screens scaffolds for contamination by looking for matches to groups outside the phylum. It removes scaffolds that are likely matching to a contamination source.

I have also just written in support for NCBI fcs-Gx tool which does similar thing to sourmash/sourpurge step.

maruiqi0710 commented 1 year ago

In addition to vecscreen and sourpurge, the filter function is also involved in removing gene fragments of non-target species. The filter step removes reads before the assemble step, and also removes reads of mitochondrial genes from the mito step. Therefore, in AAFTF, there are three steps involved in removing gene fragments of non-target species.

Did I summarize it correctly?

hyphaltip commented 1 year ago

yes

hyphaltip commented 1 year ago

someday soon I'll finish writing up a manuscript on the tool to give this more description.

the ncbi fcs-adapator and fcs-gx are being tested but gx is a little unwieldy as it needs a lot of memory and large DB while sourpurge is pretty small footprint (though the version of the DB I had been using before was removed from OSF.io so the default install may not work right now till we see if the replacement larger DB can still work efficiently).

hyphaltip commented 1 year ago

fcs-gx support is now in AAFTF - available in live code and will be part of the v0.5.0 release

hyphaltip commented 1 year ago

0.5.0 has these features. closing this query since answered above.