taxprofiler / taxpasta

TAXnomic Profile Aggregation and STAndardisation
https://taxpasta.readthedocs.io/
Apache License 2.0
31 stars 6 forks source link

Should we support the OPAL format as output? #113

Open Midnighter opened 1 year ago

Midnighter commented 1 year ago

In order to use the OPAL tool for analysis and visualization, it might be useful to convert any supported profiler to that format.

mattheatley commented 7 months ago

might be reproducing taxonkit profile2cami? https://bioinf.shenwei.me/taxonkit/usage/#profile2cami

jfy133 commented 7 months ago

Some of those commands look really nice, didn't know of that tool thanks @mattheatley ! Agreed, maybe not necesary if it's supported elsewhere? Should consider if our output is compariable as input to profile2cami though

mattheatley commented 7 months ago

so you're pretty much already there with the standard taxpasta output. but instead of the two columns (taxid & count) you'd need to provide taxid & abundance (i.e. percentage) and then that's the input required by taxonkit. tbh it's probably more useful to also have the counts in general so maybe just provide the abundances as an extra output

jfy133 commented 7 months ago
     b) Abundance (could be percentage, automatically detected or use -p/--percentage).

Raw counts are 'sequence' abunadance anyway, so maybe we are already there then?

jfy133 commented 7 months ago

But could be good to test if the two tools are comaptible, then we could update the docs to point people to taxonkit :)

mattheatley commented 7 months ago

I think maybe they are talking about proportion vs percentage and not counts but not totally sure

Midnighter commented 7 months ago

I have been considering an option to report fractions instead of counts from taxpasta for quite some time. So it seems that small change would already make the output compatible with taxonkit's profile2cami.

mattheatley commented 7 months ago

Maybe don’t do away with counts altogether though? I actually find it more useful to have them instead because you can’t convert backwards to counts from abundances. An additional output would be great though. At the moment I convert taxpasta outputs to abundances and then to cami so this would cut out a stage. But there can be rounding issues so maybe calculate them using decimals?

Midnighter commented 7 months ago

By the way, @mattheatley, I don't know if this is clear enough from the documentation: Some of the original profiler output is actually given as fractions, which we multiply with a big number in order to obtain integers. So in those cases, it would be more faithful to the original result to only report fractions.

paulzierep commented 6 days ago

One major issue atm is, that only leaf counts are supported by taxonkit https://github.com/shenwei356/taxonkit/issues/99#issuecomment-2203369757 If this is fixed I think it would work seamless with taxpasta.