waldronlab / BugSigDBcuration

For documenting issues related to BugSigDB curation.
9 stars 4 forks source link

The interplay between PCOS pathology and diet on gut microbiota in a mouse model #395

Closed Buraah closed 1 month ago

Buraah commented 1 month ago

The interplay between PCOS pathology and diet on gut microbiota in a mouse model Link: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9450977/

Scholarpat commented 1 month ago

Hello @SvetlanaUP , can I work on this instead? Thank you.

Scholarpat commented 1 month ago

Thank you @SvetlanaUP

Scholarpat commented 1 month ago

Hello @SvetlanaUP, This curation is ready for review.

Thank you.

SvetlanaUP commented 1 month ago

@Buraah could you please review this curation done by @Scholarpat? Thanks!

Buraah commented 1 month ago

Hi @Scholarpat, thank you for curating this study.

This study has several experiments but only 1 has differential abundance signatures, which you have curated. It is a well-done curation for me.

My only concern is with the data transformation. The study identified differential taxa using DESeq2, and I know the transformation for this test is usually raw count. But the study didn't mention it anywhere. However, there are several mentions of relative abundance, eg: "Before diversity comparisons, the operational taxonomic unit (OTU) counts were normalized by a total sum (% relative abundance) followed by square-root transformation." (First paragraph Under Analysis of microbial communities section)

So for data transformation, I'm tending more toward "Relative Abundance" but I stand to be corrected.

Scholarpat commented 1 month ago

Hello @Buraah,

Thank you so much for your review and feedback.

Regarding the data transformation, I initially opted for relative abundance. However, the OTU counts mentioned in the excerpt you quoted made me switch to raw counts. I am still somewhat uncertain about this decision though...

SvetlanaUP commented 1 month ago

Great work @Scholarpat and @Buraah!

I remember we discussed this; here it is: https://community-bioc.slack.com/archives/C04RATV9VCY/p1697145085673689?thread_ts=1697141457.022309&cid=C04RATV9VCY

Chloe nicely explained: Data transformations are often dependent on the statistical test. This can be difficult to figure out so I recommend asking questions if you're not sure but generally speaking: Raw counts -> poisson, negative binomial, linear models, DeSeq2 Relative abundances -> This is most common. Mann Whitney U, Kruskall Wallis, LeFSe, many others Centered log ratio -> Rare. ANCOM Arcsine square-root -> Rare. MaAsLin2 sometimes. Some linear models rarely.

I will note that this is wrong or missing for many previously curated papers and a good cleanup task for an intrepid soul would be to try to update all of these. I've found a lot of DESeq2 papers that say they use relative abundances or CLR which do not make sense--**DESeq2 uses a negative binomial model which requires counts** or else it will not converge. Or variables that are whole numbers that approximate a negative binomial/poisson distribution.

SvetlanaUP commented 1 month ago

https://bugsigdb.org/Study_1082 reviewed.

Scholarpat commented 1 month ago

Thank you @SvetlanaUP . I've noted this for future curations.

Buraah commented 1 month ago

Well done @Scholarpat! Thank you @SvetlanaUP I think the cleanup will be a great task to take up too. I'll consider doing this after the curation I'm currently working on.

Buraah commented 3 weeks ago

Hi @SvetlanaUP Good morning. While I wait for the author's response on Study 1083 I want to start the cleanup task we talked about here.

I also noticed some studies curated earlier have no input for data transformation and would like to fill them as I go. For example these two: Study 2 and Study 3

Thank you.

SvetlanaUP commented 3 weeks ago

@Buraah please do, THANKS!