microbiome / mia

Microbiome analysis
https://microbiome.github.io/mia/
Artistic License 2.0
50 stars 27 forks source link

Fix importTaxpasta #619

Closed d-callan closed 3 months ago

d-callan commented 3 months ago

using taxpasta via nf-core/taxprofiler v1.1.8 and tried to use importTaxpasta to get a tse. I got an error like type field has unsupported value (see here). after some digging around i see it looks like the taxpasta biom leaves this field empty.

this pr represents a proposal to have this type default to 'OTU table'. it also proposes making addHierarchyTree optional when importing taxpasta data. my particular use case doesnt require it, and its a bit time consuming.

i also notice addHierarchyTree seems to not handle fully unclassified entities well. Im considering improving that situation out of scope for this PR unless someone objects, and am happy to file an issue if i dont find one once i have more details. but its also influenced my desire to make it optional in this context, so thought id mention it here.

antagomir commented 3 months ago

These suggestions sound good to me but it would be good to see what @TuomasBorman has to say.

TuomasBorman commented 3 months ago

I think set.ranks should also be optional (not related to the issue, but might be good to fix here also). You are free to do that also if you want, but I can also do that after you have finished finalizing your code.

Your plan about addHiearchyTree: Sounds very good! We can continue this in separate issue/PR to discuss about the solution.

d-callan commented 3 months ago

hi @TuomasBorman and thanks for the detailed feedback! I wasnt sure how people would feel about these changes and so didnt put any real effort into styling/ linting/ etc yet, but i will do that now. I like your suggestions, but wonder about addHierarchyTree -> add.tree.. funnily enough i usually prefer longer names than shorter for things. i like the explicitness of it, the clarity and readability, as i figure anyone who knows the method for addHierarchyTree will immediately guess what the similarly named parameter does and vice versa. ill make it add.tree if you like, but wonder if youd be alright w add.hierarchy.tree

TuomasBorman commented 3 months ago

hi @TuomasBorman and thanks for the detailed feedback! I wasnt sure how people would feel about these changes and so didnt put any real effort into styling/ linting/ etc yet, but i will do that now. I like your suggestions, but wonder about addHierarchyTree -> add.tree.. funnily enough i usually prefer longer names than shorter for things. i like the explicitness of it, the clarity and readability, as i figure anyone who knows the method for addHierarchyTree will immediately guess what the similarly named parameter does and vice versa. ill make it add.tree if you like, but wonder if youd be alright w add.hierarchy.tree

I understand your point about the importance of explicitness and transparency in naming. However, using add.tree aligns well with other existing parameters, such as update.tree. Additionally, the risk of misunderstanding is minimal because the documentation is intended to clearly explain the functionality. However, you pointed a problem in the documentation as it is not currently clearly explaining where the tree comes from. (Would you be able to fix this?) The documentation should describe that tree is calculated with addHierarchyTree. I guess the names are balance; they should be clear enough but not too long since they might be annoying for some users.

d-callan commented 3 months ago

ok. i believe all comments addressed now, but let me know.

TuomasBorman commented 3 months ago

@d-callan I was pushing to Bioconductor and noticed that the maximum size of file can be 5 MB. However, the biom file is over 10 MB. Can you subset it? Unfortunately, I did not remember this earlier.

d-callan commented 3 months ago

hey. sry. this week has been awful at work and this got away from me. fixing now.