khyox / recentrifuge

Recentrifuge: robust comparative analysis and contamination removal for metagenomics
http://www.recentrifuge.org
Other
86 stars 7 forks source link

(question) produced output #43

Closed arz19893 closed 11 months ago

arz19893 commented 2 years ago

Hi@khyox

I used the tool for Kraken files. after downloading the tools, here is my follow the command

to my cuurent direcity

$ retaxdump

$ rcf -k CRTL1.kraken -k abudl_testextract_barcode26.kraken -c 1 -o Xsamples.rcf.html -s KRAKEN -y 25 -x 9606

following the output and reading the instruction, I observe that the confidence level is increased (from the chart link) but wondering

is that mean any organism present in control will be discarded in samples? but what if the sample has the discarded organism (not from contamination) ? , in the excel produced file, is there an option I can see which organism is selected for filtering?

following the output, I see you can choose either CSV, FULL, etc Can I produce Kraken report.txt file as well for each sample?

Thank you for the help

Screenshot 2022-05-31 at 14 22 39
khyox commented 2 years ago

Hi @arz19893,

About your first question, in general, most of the organisms present in the negative controls are removed from the regular samples, but the final outcome depends on your particular case. The robust contamination removal method in Recentrifuge is explained in this subsection of the paper. You can find further details in the wiki section "Understanding the messages of the robust contamination removal algorithm".

About your second question, I am not sure that I completely understand it, but the documentation that I referred above should help with this too. Recentrifuge is able to detect some kind of cross-contamination, even from regular samples to the controls. In case you have only one negative control and the organisms is quite abundant both in this sample and in the regular ones, it will be removed. If a native organism in the regular samples has introduced an important contamination on the negative controls, then the experimental procedure is flawed with severe cross-contamination to the point that the "negative controls" cannot be considered so anymore. In other cases, you can fine tune some parameters of the filters in Recentrifuge, see for example this comment.

Regarding your 3rd question: No, the tabulated output (including the excel files) is not giving detailed information for the contamination removal, but you can see the logs, which provide with all the details with the aid of colored output for identifying the kind of contamination, as explained in the wiki section "Understanding the messages of the robust contamination removal algorithm". In #40, it was suggested that to provide, optional, separate output (beyond the console log) devoted to the contamination removal algorithm would be a welcomed addition.

Finally, about your last question, there is no option on Recentrifuge to provide Kraken's report file. While Recentrifuge supports Kraken and other classifiers, its most complete support is for the Centrifuge classifier (hence its name).