peterjc / thapbi-pict

Tree Health and Plant Biosecurity Initiative - Phytophthora ITS1 Classifier Tool
https://thapbi-pict.readthedocs.io/
MIT License
8 stars 2 forks source link

Record any chimera status in sample-tally final column #548

Closed peterjc closed 1 year ago

peterjc commented 1 year ago

This makes it slightly harder to convert the sample-tally TSV output into BIOM format as would have to drop the chimera column.

Plan is to next convert the classify step to take the sample-tally TSV output, and append its taxid and taxonomy columns to it as output, giving a single file for input to the summary command (which would attach any metadata and make the pretty reports).

As an aside, excerpt from running this on our main dataset:

$ grep -i chimera slurm-5919085_*
slurm-5919085_2.out:USEARCH flagged 4204 as chimeras
slurm-5919085_2.out:Have 145 chimeras passing the abundance thresholds.
slurm-5919085_3.out:VSEARCH flagged 4030 as chimeras
slurm-5919085_3.out:Have 124 chimeras passing the abundance thresholds.
peterjc commented 1 year ago

At this point we've dropped the <STEM>.all_reads.fasta output in the pipeline, and the classify step can take either FASTA inputs (legacy), or sample-tally TSV, but still produces the old style TSV output.

Would next change the classifier to add columns to the sample-tally style TSV, and then make the summary accept that single file over the current pair of files.