khyox / recentrifuge

Recentrifuge: robust comparative analysis and contamination removal for metagenomics
http://www.recentrifuge.org
Other
86 stars 7 forks source link

unclassified reads in rextract #32

Closed comixwang closed 3 years ago

comixwang commented 3 years ago

Hi, thanks for the tool. I have a question regarding the usage of rextract.

I have the classification result from centrifuge and want to use rextract to remove some contaminant reads(mainly bacteria) from my dataset so that I can do de novo genome assembly for my species(Eukoryota). It seems that either option "-i" or "-x" will not include any of the unclassified reads in the extraction. I wonder if there is a reason for this. Does it make more sense to exclude the unclassified reads for de novo assembly? Thank you.

khyox commented 3 years ago

Sorry for the late reply @comixwang , I somehow missed this issue. Originally, the feature of extract only the unclassified reads (or also the classified reads below a score level) was not considered as useful as the opposite: to extract the reads beyond a score. Recently, it has been some interest in this new feature (#28) and one particular application is what you mention. This is now on top of my list of improvements. I am closing this as duplicate, but you can follow the next developments on #28.