khyox / recentrifuge

Recentrifuge: robust comparative analysis and contamination removal for metagenomics
http://www.recentrifuge.org
Other
86 stars 7 forks source link

abundance table as output #41

Closed rjsorr closed 11 months ago

rjsorr commented 2 years ago

Hi @khyox,

I am wondering what options I need to use to get an abundace table as ouput if I want to downstream process in another program (e.g. R)? for example an output similar to that given by kraken2 report (attached), but of course post contamination removal. Another way of asking is what is the input used to construct the graphic html files?

regards SL342519_REPORT.Kraken2.txt

khyox commented 2 years ago

Hi @rjsorr,

For any downstream processing, you don't want to use the HTML output but Recentrifuge's "extra" output or the pickled (serialized (output). These are the relevant options for you:

-e OUTPUT_TYPE, --extra OUTPUT_TYPE
                        type of extra output to be generated, and can be one
                        of ['FULL', 'CSV', 'MULTICSV', 'TSV']
  -p, --pickle          pickle (serialize) statistics and data results in
                        pandas DataFrames (format affected by selection of
                        --extra)

With FULL you will get an Excel file with all the information (one spreadsheet for statistics, another for all the data, as explained in the paper, in the manual, and in the wiki), with 'CSV' you will get a single CSV file, and likewise with 'TSV' you will get a single TSV file. In addition, you have the option to generate one file per sample by using the --extra MULTICSV or just -e MULTICSV option in rcf, so that with 'MULTICSV' you will get one CSV file per sample.

Finally, if you are processing Recentrifuge's results via a custom code downstream, you may take advantage of the --pickle flag. With that, rcf will pickle (serialize) both the statistics and data results in pandas DataFrames contained in a compressed pickle file. Be aware that the specific format of the DataFrames are affected by the selection of any relevant options, such as --extra.