Closed sherlyn99 closed 1 month ago
Hello @sherlyn99 Sorry for the late response. This is because when a file is specified as input, Woltka considers it as multiplexed by default, and attempts to extract sample names from each read ID. When it can't, it considers that all reads belong to a sample with an empty name. To resolve this, you can add --no-demux
to the command. Here is the explanation. Hope it helps!
Hi @qiyunzhu, this makes sense. Thanks a lot!
Hi, I encountered a strange case - when I give woltka the path to a specific samfile, the output table does not label the column with sample name:
e.g. I ran
woltka classify -i sickpatient_1_metaG_trimmed_esbl.sam --map /wol1/wol-/taxid.map --nodes /wol1/wol-20April2021/taxonomy/nodes.dmp --names /wol1/wol-20April2021/taxonomy/names.dmp --rank species --name-as-id --outmap sickpatient_1_metaG_mapdir -o species_ct.tsv
and here are the first 5 lines of
species_ct.tsv
However, when I put this file in a folder and give woltka the filepath to the folder, i.e.
woltka classify -i test_samfiles/ --map /wol1/wol-20April2021/taxonomy/taxid.map --nodes /wol1/wol-20April2021/taxonomy/nodes.dmp --names /wol1/wol-20April2021/taxonomy/names.dmp --rank species --name-as-id --outmap sickpatient_1_metaG_mapdir -o species_ct.tsv
I got a correctly labelled species table:
If this is not the intended behavior, is there anyway to fix this? I am currently trying to speeding up woltka by running individual samfiles in parallel and then combine the tsv tables (My users are more comfortable with tsvs than bioms), hence this question.
Thank you!