waldronlab / presentations

A repository for public presentations
Creative Commons Zero v1.0 Universal
6 stars 9 forks source link

-t clade_profiles MetaPhlAn2 for DESeq2 #2

Open mghanbari opened 7 years ago

mghanbari commented 7 years ago

Hi In you presentation "Statistical analysis for metagenomic data" on June 6-7, 2016, you have mentioned that

Note: better to use metaphlan2 option: -t clade_profiles to generate normalized counts instead of relative abundance

I did so and now I have the results. But the resulted file shows the normalized value for different markers per clade, so how should I get one number per clade for downstream DESeq2 analysis? Should I get an average for markers per clade?

Thanks for the great presentations.

Regards Mahdi

lwaldron commented 7 years ago

Dear Mahdi,

Thanks for pointing that out to me, I actually didn't realize that the -t clade_profiles option returned per-marker counts rather than per-clade counts. I am going to update my advice based on the curatedMetagenomicData pipeline and how we've done differential abundance analysis from it. You can see the exact options that curatedMetagenomicData uses on line 45 here, which do not involve the -t clade_profiles option. What I've done then is to multiply divide these % abundances by 100 and multiply by read depth to get a normalized estimate of read counts. See the section "Estimating Absolute Raw Count Data" in the curatedMetagenomicData vignette.

@edoardopasolli and @nsegata, does this make sense to you?

mghanbari commented 7 years ago

Thank you for your comments. I'll go with your suggestion. I was wondering if you could also include a tutorial in your future presentation about how to control for more than 1 confounding factor. Also, due to increasing number of time-series analysis in microbiome studies, how to analyze this kind of data with DESeq2 package. Although there is an example in DESeq2 vignette, however, your explanation from the micribiome studies point of view would be great.