qiime2 / q2-fragment-insertion

BSD 3-Clause "New" or "Revised" License
13 stars 17 forks source link

Classify - otus experimental had blank taxon #55

Open shaekin opened 5 years ago

shaekin commented 5 years ago

Bug Description I ran the classify otus experimental and I was getting an error that one of the entries was a float and it couldn't parse it. After digging into the taxonomy file, it looks like one of the entries was blank, and it was reading it as NaN, and it broke it. Once I deleted the line, everything ran fine.

Questions Any chance something could be coded to avoid this issue in the future?

thermokarst commented 5 years ago

Hi @shaekin, I am moving this issue to the appropriate repository - thanks!

thermokarst commented 5 years ago

As well, can you please edit this issue to use the issue templates we have provided? Thanks!

nbokulich commented 5 years ago

this is not a q2-feature-classifier issue, either. As far as I know, classify-otus-experimental is in q2-fragment-insertion. @shaekin can you please post issues like this on the QIIME 2 forum in the future?

sjanssen2 commented 5 years ago

Hi @shaekin thank you very much for reporting this issue - and also for providing a workaround. Can you, by any chance, attach the faulty taxonomy file for debugging? And maybe the failing commands. Thanks again, Stefan

sjanssen2 commented 5 years ago

@thermokarst and @nbokulich This issue might not only apply here but could be of wider impact as @shaekin initially suggested. It is true that the bug manifests in q2-fragment-insertion, however one could prevent it at importing the raw taxonomy.tsv file.

Have a look at the four example files in taxonomies.zip. The following works as expected: x=working; qiime tools import --input-path $x.tsv --type "FeatureData[Taxonomy]" --output-path $x.qza

However, importing taxonomies with either blank feature ID missing_id.tsv, no lineage string missing_lineage.tsv or a blank line in the middle of the "table" blank_line.tsv all work without any notification to the user about those data issues.

Is this a feature (allow a quite general input) or a bug (not semantically checking the provided information)?