Closed rjsorr closed 11 months ago
Hi @rjsorr,
For any downstream processing, you don't want to use the HTML output but Recentrifuge's "extra" output or the pickled (serialized (output). These are the relevant options for you:
-e OUTPUT_TYPE, --extra OUTPUT_TYPE
type of extra output to be generated, and can be one
of ['FULL', 'CSV', 'MULTICSV', 'TSV']
-p, --pickle pickle (serialize) statistics and data results in
pandas DataFrames (format affected by selection of
--extra)
With FULL
you will get an Excel file with all the information (one spreadsheet for statistics, another for all the data, as explained in the paper, in the manual, and in the wiki), with 'CSV' you will get a single CSV file, and likewise with 'TSV' you will get a single TSV file. In addition, you have the option to generate one file per sample by using the --extra MULTICSV
or just -e MULTICSV
option in rcf
, so that with 'MULTICSV' you will get one CSV file per sample.
Finally, if you are processing Recentrifuge's results via a custom code downstream, you may take advantage of the --pickle
flag. With that, rcf
will pickle (serialize) both the statistics and data results in pandas DataFrames contained in a compressed pickle file. Be aware that the specific format of the DataFrames are affected by the selection of any relevant options, such as --extra
.
Hi @khyox,
I am wondering what options I need to use to get an abundace table as ouput if I want to downstream process in another program (e.g. R)? for example an output similar to that given by kraken2 report (attached), but of course post contamination removal. Another way of asking is what is the input used to construct the graphic html files?
regards SL342519_REPORT.Kraken2.txt