pirovc / ganon

ganon2 classifies genomic sequences against large sets of references efficiently, with integrated download and update of databases (refseq/genbank), taxonomic profiling (ncbi/gtdb), binning and hierarchical classification, customized reporting and more
https://pirovc.github.io/ganon/
MIT License
86 stars 13 forks source link

Compatability with recentrifuge? (add read len to the output) #198

Open rjsorr opened 2 years ago

rjsorr commented 2 years ago

Hi I'm wondering as to the compatability of ganon (classify/report) output with recentrifuge (https://github.com/khyox/recentrifuge/wiki/Running-recentrifuge-for-a-generic-classifier)? I need to remove contaminant reads from the dataset and for a person who doesn't use R (Decontam), then recentrifuge seems like the best option. As such, I'm wondering if a) the ganon output is compatible with Recentrifuge (which I presume it is)? and b) What output, either from classify or report, will give me a file that is closest to the requested input of Recentrifuge?

pirovc commented 2 years ago

The file that would be the closest is either the lca (--output-lca) or complete output (--output-all). Those files report 3 fields: read id, target (taxid), k-mer/minimizer count.

You could use the k-mer/minimizer count as the score for recentrifuge but the read length is not reported, you'd have to add to those files after the run.

rjsorr commented 2 years ago

Cheers @pirovc ! I'll add/match the read length column and give it a try

pirovc commented 2 years ago

@rjsorr any success running recentrifuge with ganon? I'll leave this open as a possible enhancement

rjsorr commented 2 years ago

see for compatability solution: https://github.com/khyox/recentrifuge/issues/35