statdivlab / corncob

Count Regression for Correlated Observations with the Beta-binomial
102 stars 22 forks source link

summary of bbdml does not show coefficients of one of the groups of the categorical variable #82

Closed golamrabbani268 closed 4 years ago

golamrabbani268 commented 4 years ago

I ran the following code and got the following output. Structure.DNA Extracted.from contains 4 different values: Leaf, Rhizome, Root, and Sediment. However, the summary does not show the coefficient and other values for the Leaf group.

> ## Fitting a Model
> ps1_filtered <- ps1_new %>% 
+   tax_glom("Phylum")
> ps1_filtered
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 42 taxa and 237 samples ]
sample_data() Sample Data:       [ 237 samples by 5 sample variables ]
tax_table()   Taxonomy Table:    [ 42 taxa by 6 taxonomic ranks ]
refseq()      DNAStringSet:      [ 42 reference sequences ]
> tax_table(ps1_filtered)[1:5,]
Taxonomy Table:     [5 taxa by 6 taxonomic ranks]:
      Kingdom    Phylum             Class Order Family Genus
ASV10 "Bacteria" "Actinobacteriota" NA    NA    NA     NA   
ASV21 "Bacteria" "Cyanobacteria"    NA    NA    NA     NA   
ASV23 "Bacteria" "Myxococcota"      NA    NA    NA     NA   
ASV27 "Bacteria" "Campilobacterota" NA    NA    NA     NA   
ASV32 "Bacteria" "Proteobacteria"   NA    NA    NA     NA   
> ## Adding covariates
> corncob_structure <- bbdml(formula = ASV32 ~ Structure.DNA.Extracted.from,
+                            phi.formula = ~ Structure.DNA.Extracted.from,
+                            data = ps1_filtered)
> ## Parameter Interpretation
> summary(corncob_structure)

Call:
bbdml(formula = ASV32 ~ Structure.DNA.Extracted.from, phi.formula = ~Structure.DNA.Extracted.from, 
    data = ps1_filtered)

Coefficients associated with abundance:
                                     Estimate Std. Error t value Pr(>|t|)    
(Intercept)                            2.0376     0.1893  10.766  < 2e-16 ***
Structure.DNA.Extracted.fromRhizome   -2.0130     0.2316  -8.693  6.9e-16 ***
Structure.DNA.Extracted.fromRoot      -1.0945     0.2162  -5.063  8.5e-07 ***
Structure.DNA.Extracted.fromSediment  -2.8071     0.2080 -13.497  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Coefficients associated with dispersion:
                                     Estimate Std. Error t value Pr(>|t|)    
(Intercept)                           -1.2941     0.2261  -5.723 3.26e-08 ***
Structure.DNA.Extracted.fromRhizome    0.3390     0.2771   1.224  0.22235    
Structure.DNA.Extracted.fromRoot      -0.5739     0.2853  -2.012  0.04542 *  
Structure.DNA.Extracted.fromSediment  -0.9360     0.2863  -3.269  0.00125 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Log-likelihood: -1898.8

Also tried using another categorical variable but that also does not show the first group.

Is there something that I am forgetting to do and/or how can get the output for all groups? Thanks!

golamrabbani268 commented 4 years ago

After repeating the steps from another project, it seems that the problem is most probably with some particular setting of R on my computer. Still looking for a solution to this.

bryandmartin commented 4 years ago

Hi,

The interpretation of the categorical covariates is in terms of the expected difference between the group shown and the base group. Thus, the base group isn't shown because it doesn't it have it's own covariate.

This is standard practice in R and most categorical covariate regression software I am aware of. The default categorical covariates follow the structure of lm and glm in R. Does this make sense? Happy to elaborate if you aren't as familiar with these types of models.

Brya

golamrabbani268 commented 4 years ago

Hi, thanks a lot for your response. I understand what you said. How would I go about to change the base group against which the other groups are compared with? For example, how can I compare all of the other groups against Rhizome instead of Leaf? Thanks, Golam

bryandmartin commented 4 years ago

There are a few ways you could do this, depending on the structure of your data. I'd recommend factor re-ordering using forcats::fct_relevel()

golamrabbani268 commented 4 years ago

Thanks a lot for your help! I'll try that out. I'll close this issue then. Golam