ycl6 / 16S-rDNA-V3-V4

16S rDNA V3-V4 amplicon sequencing analysis using dada2, phyloseq, LEfSe, picrust2 and other tools. Demo: https://ycl6.github.io/16S-Demo/
GNU General Public License v3.0
33 stars 18 forks source link

Relative abundance #7

Closed Motteran closed 7 months ago

Motteran commented 7 months ago

Dear, I'm trying to reproduce the script https://ycl6.github.io/16S-Demo/1_dada2_tutorial.html and I ran into a first problem. How can I solve it, I know there will be others but in this first one the relative abundance is only counted as 1 or 0. Did I do any previous procedures wrong?

taxcumsum = tdt[, .N, by = TotalCounts] 
setkey(taxcumsum, TotalCounts)
taxcumsum[, CumSum := cumsum(N)]

pCumSum = ggplot(taxcumsum, aes(TotalCounts, CumSum)) + geom_point() + theme_bw() + 
  xlab("Filtering Threshold") + ylab("ASV Filtered")
gridExtra::grid.arrange(pCumSum, pCumSum + xlim(0, 500), 
                        pCumSum + xlim(0, 100), pCumSum + xlim(0, 50), nrow = 2, 
                        top = "ASVs that would be filtered vs. minimum taxa counts threshold")

mdt = fast_melt(ps)
mdt = mdt[count > 0] [!is.na(count)]
mdt[, RelativeAbundance := count / sum(count), by = taxaID] 
mdt

       Kingdom         Phylum               Class                 Order               Family
   1: Bacteria Proteobacteria Gammaproteobacteria       Methylococcales      Methylomonaceae
   2: Bacteria Proteobacteria Gammaproteobacteria Betaproteobacteriales              SC-I-84
   3: Bacteria Proteobacteria Deltaproteobacteria   Syntrophobacterales Syntrophobacteraceae
   4: Bacteria           <NA>                <NA>                  <NA>                 <NA>
   5: Bacteria Proteobacteria Gammaproteobacteria    Steroidobacterales  Steroidobacteraceae
  ---                                                                                       
1043: Bacteria Proteobacteria                <NA>                  <NA>                 <NA>
1044: Bacteria Actinobacteria      Actinobacteria                  <NA>                 <NA>
1045: Bacteria Proteobacteria Deltaproteobacteria                  <NA>                 <NA>
1046: Bacteria Proteobacteria Gammaproteobacteria    Steroidobacterales  Steroidobacteraceae
1047: Bacteria  Bacteroidetes         Bacteroidia         Bacteroidales   Prolixibacteraceae

                Genus  taxaID SampleID count RelativeAbundance
   1:      Crenothrix    OTU1      sa1   332                 1
   2:            <NA>   OTU10      sa1   147                 1
   3: Syntrophobacter  OTU10  sa1    49                 1
   4:            <NA> OTU1000      sa1     1                 1
   5:            <NA> OTU1001      sa1     1                 1
  ---                                                         
1043:            <NA>  OTU995      sa1     1                 1
1044:            <NA>  OTU996      sa1     1                 1
1045:            <NA>  OTU997      sa1     1                 1
1046:            <NA>  OTU998      sa1     1                 1
1047:            <NA>  OTU999      sa1     1                 1

Thank you very much in advance

ycl6 commented 7 months ago

Hi @Motteran

I edited your post with the proper syntax (see documentation here). For example:

```r x <- sum(c(1, 2, 3, 4)) x ```

About your question, do you only have 1 sample, i.e. sa1, in your ps object? If so, then it is correct. It is calculating the relative abundance contributed by your samples. Since they all came from the same sample, i.e. sa1, then it is 1 or 100%.

Motteran commented 7 months ago

Thanks for the answer

ycl6 commented 7 months ago

No problem. You can close the issue if this answers your question.