Closed minhtrung1997 closed 6 months ago
@Midnighter is this a summing to 100% issue again?
Yes, it is 😬
Freaking Maths Shakes fist
Currently, I can circumvented this with this notebook extract_metaphlan_profile.ipynb.txt The output is pretty well like taxpasta Please check this and see if it can help update taxpasta
Thanks for the effort @minhtrung1997, but parsing the file is not really the issue. We perform a number of validations on profiles passed to taxpasta, and one of them checks whether the relative abundances per rank add up to 100%. Those checks regularly fail due to floating point arithmetic and truncated outputs when profilers write to a text file. (Typically, they only write 6 decimal places which is insufficient when we have tens of millions of reads.)
We could add an option to skip the validation of compositionality.
I think the option skipping of compositionality might be a good idea given how often that particular check comes up. Maybe something generic like --leniant
?
I think, I'd use leniant mode for turning off a whole bunch of validations, and for this single one maybe allow defining the acceptable absolute deviation from 100%? Something like --compositionality-threshold 0.1
for up to 10% off?
Sounds good to me!
Is there an existing issue for this?
Problem description
I try to run this command in other to get the taxpasta file
It present the error:
Code sample
Code run:
Traceback:
Environment
Anything else?
Error dataframe;