qiyunzhu / woltka

Woltka: a versatile meta'omic data classifier
BSD 3-Clause "New" or "Revised" License
68 stars 24 forks source link

missing column labels when running woltka classify on individual samfile #208

Closed sherlyn99 closed 1 month ago

sherlyn99 commented 4 months ago

Hi, I encountered a strange case - when I give woltka the path to a specific samfile, the output table does not label the column with sample name:

e.g. I ran woltka classify -i sickpatient_1_metaG_trimmed_esbl.sam --map /wol1/wol-/taxid.map --nodes /wol1/wol-20April2021/taxonomy/nodes.dmp --names /wol1/wol-20April2021/taxonomy/names.dmp --rank species --name-as-id --outmap sickpatient_1_metaG_mapdir -o species_ct.tsv

and here are the first 5 lines of species_ct.tsv

#FeatureID  

Anaerostipes caccae 709

Alistipes obesi 17469

Akkermansia sp. KLE1605 372

Fusicatenibacter saccharivorans 9507

However, when I put this file in a folder and give woltka the filepath to the folder, i.e. woltka classify -i test_samfiles/ --map /wol1/wol-20April2021/taxonomy/taxid.map --nodes /wol1/wol-20April2021/taxonomy/nodes.dmp --names /wol1/wol-20April2021/taxonomy/names.dmp --rank species --name-as-id --outmap sickpatient_1_metaG_mapdir -o species_ct.tsv

I got a correctly labelled species table:

#FeatureID  sickpatient_1_metaG_trimmed_esbl

Anaerostipes caccae 709

Alistipes obesi 17469

Akkermansia sp. KLE1605 372

Fusicatenibacter saccharivorans 9507

If this is not the intended behavior, is there anyway to fix this? I am currently trying to speeding up woltka by running individual samfiles in parallel and then combine the tsv tables (My users are more comfortable with tsvs than bioms), hence this question.

Thank you!

qiyunzhu commented 1 month ago

Hello @sherlyn99 Sorry for the late response. This is because when a file is specified as input, Woltka considers it as multiplexed by default, and attempts to extract sample names from each read ID. When it can't, it considers that all reads belong to a sample with an empty name. To resolve this, you can add --no-demux to the command. Here is the explanation. Hope it helps!

sherlyn99 commented 1 month ago

Hi @qiyunzhu, this makes sense. Thanks a lot!