statdivlab / corncob

Count Regression for Correlated Observations with the Beta-binomial
102 stars 22 forks source link

Trying to determine significance with differentialTest when one group has no counts of an ASV and the other group is dominated by it #174

Closed AlexiPearson-Lund closed 6 months ago

AlexiPearson-Lund commented 6 months ago

Hello,

I have a data set where I am trying to determine the differential abundance of ASVs between groups using differentialTest(). I am looking at the microbiome before and after treatment. Before treatment I have no reads of a particular ASV (ASV3) and then after treatment this ASV is in very high abundance (an average of 40% abundance in my after-treatment group). I've read that corncob has issues when one group doesn't have any reads. I tried artificially adding in either 1 or 100 reads to each sample. When I added in 1 I got 1400 significant ASVs and when I did 100 to each, I got 96. I don't understand why that is or if this technique is a reasonable workaround. Is there a better workaround?

I am extremely new to this kind of analysis and greatly appreciate the help!

adw96 commented 6 months ago

Not a problem, @AlexiPearson-Lund , and thanks for your great question.

I appreciate your difficulty -- that these are clearly differentially abundant taxa, and you'd like to quantify that with a p-value, but corncob won't give you a p-value in this case.

You have at least two options, one of which I recommend much more strongly

  1. radEmu is a new differential abundance method, and it has many advantages over corncob (these are listed on the homepage). One of the advantages is that radEmu will give you a p-value for perfectly separating taxa (unlike corncob). It also addresses a number of other limitations of corncob. I'm personally switching over to radEmu, and recommending it for all of my collaborators. (I mention it here because it also addresses your specific issue)
  2. If you are committed to using corncob, you have the following options:
    • In your results description, identify these taxa as "perfectly separating" taxa. So, you can list differentially abundant taxa that are not perfectly separating along with p-values, and list differentially abundant taxa that are perfectly separated without p-values. I think many papers have done this and it's accepted convention in the field.
    • You could add the smallest observed count to every observation in your ASV table, then perform corncob's differential abundance analysis.

I personally recommend Option 1.

I hope this helps but feel free to reopen this issue if not.