vmikk / metagMisc

Miscellaneous functions for metagenomic analysis.
MIT License
46 stars 11 forks source link

phyloseq_mult_raref_avg: Perform rarefaction and average relative OTU abundance issue #13

Closed Salineraptor closed 4 years ago

Salineraptor commented 4 years ago

Hi all

I've an issue with the phyloseq_mult_raref_avg function; it works on this phyloseq object. phyloseq_summary(ps, cols = NULL, more_stats = FALSE,

works<-phyloseq_mult_raref_avg(ps,replace = T, SampSize = 10000, iter = 3) ..Multiple rarefaction |=====================================================================================| 100% ..Sample renaming ..Rarefied data merging ..Splitting by sample ..OTU abundance averaging within rarefaction iterations |=====================================================================================| 100% ..Re-create phyloseq object

But not this phyloseq object;

phyloseq_summary(p.b.a.m.s.lab, cols = NULL, more_stats = FALSE, long = FALSE)

Parameter Phys1

1 Number of samples 36.0000

2 Number of OTUs 7981.0000

3 Total number of reads 4781965.0000

4 Average number of reads per OTU 599.1687

5 Average number of reads per sample 132832.3611

fails<-phyloseq_mult_raref_avg(p.b.a.m.s.lab,,replace = T, SampSize = 10000, iter = 3) ..Multiple rarefaction |=====================================================================================| 100% ..Sample renaming ..Rarefied data merging ..Splitting by sample Error in validObject(.Object) : invalid class “otu_table” object: OTU abundance data must have non-zero dimensions.

validotu_table(otu_table(p.b.a.m.s.lab)) [1] TRUE sum(is.na(otu_table(p.b.a.m.s.lab))) [1] 0

I've psmelted it etc and all looks good no irregularities. Makes zero sense. phyloseq_mult_raref works on both...

Regards Cameron

vmikk commented 4 years ago

Hello Cameron!

It is difficult to say why it's not working without seeing the data. I think that it could be because of the small number of reads in some samples (probably after rarefaction). Could you post the results of:

sample_sums(p.b.a.m.s.lab)
taxa_are_rows(p.b.a.m.s.lab)

With best regards, Vladimir

Salineraptor commented 4 years ago

Hi Vladimir

Thank you so much for getting back to me.

I do lose two samples after rarefaction functions like https://rdrr.io/github/vmikk/metagMisc/man/phyloseq_mult_raref.html are applied yes. B30 & B31. I lose the same samples if i were to rarefy on the unmerged replicates. So it in theory shouldn't make a difference.

> phyloseq.runs

$1

phyloseq-class experiment-level object

otu_table() OTU Table: [ 5460 taxa and 34 samples ]

sample_data() Sample Data: [ 34 samples by 20 sample variables ]

tax_table() Taxonomy Table: [ 5460 taxa by 7 taxonomic ranks ]

phy_tree() Phylogenetic Tree: [ 5460 tips and 5459 internal nodes ]

#

$2

phyloseq-class experiment-level object

otu_table() OTU Table: [ 5348 taxa and 34 samples ]

sample_data() Sample Data: [ 34 samples by 20 sample variables ]

tax_table() Taxonomy Table: [ 5348 taxa by 7 taxonomic ranks ]

phy_tree() Phylogenetic Tree: [ 5348 tips and 5347 internal nodes ]

...

However i still want an average..

sample_sums(p.b.a.m.s.lab)

                      B1                           B2
         B3
                  101185                        97908
      37062
                      B4                           B5
         B6
                   54592                        52062
     131014
                      B7                           B8
         B9
                  172053                       171042
     277328
                     B10                          B11
        B12
                  478035                       253444
     257516
                     B13                          B14
        B15
                  141324                       115771
      22225
                     B16                          B17
        B18
                  277429                       161581
      97891
                     B19                          B20
        B21
                   62709                       220285
      48663
                     B22                          B23
        B24
                   48705                        37352
     144526
                     B25                          B26
        B27
                  201021                       174101
      38829
                     B28                          B29
        B30
                  245844                        16787
       3236
                     B31                          B32
        B33
                    1802                        20673
     229191
                     B34                          B35
        B36
                  205454                        66717
     116608

taxa_are_rows(p.b.a.m.s.lab)

So yes this was the problem: the taxa weren't rows so i transposed the OTU table and it works. However, I now have a new discrepancy. Whats the difference here?

p.rf<-phyloseq_mult_raref_avg(p.T.O, SampSize = 10000,iter = 100, replace = T) ..Multiple rarefaction |=================================================================| 100% ..Sample renaming ..Rarefied data merging ..Splitting by sample ..OTU abundance averaging within rarefaction iterations |=================================================================| 100% ..Re-create phyloseq object

p.rf phyloseq-class experiment-level object otu_table() OTU Table: [ 7943 taxa and 34 samples ] sample_data() Sample Data: [ 34 samples by 20 sample variables ] tax_table() Taxonomy Table: [ 7943 taxa by 7 taxonomic ranks ]

AND THIS

p.rf.1<-rarefy_even_depth(p.T.O, sample.size = 10000,

  • rngseed = T, replace = TRUE, trimOTUs = TRUE, verbose = TRUE) set.seed(TRUE) was used to initialize repeatable random subsampling. Please record this for your records so others can reproduce. Try set.seed(TRUE); .Random.seed for the full vector ... 2 samples removedbecause they contained fewer reads than sample.size. Up to first five removed samples are:

B30 B31 ... 2521OTUs were removed because they are no longer present in any sample after random subsampling

...

p.rf.1 phyloseq-class experiment-level object otu_table() OTU Table: [ 5460 taxa and 34 samples ] sample_data() Sample Data: [ 34 samples by 20 sample variables ] tax_table() Taxonomy Table: [ 5460 taxa by 7 taxonomic ranks ]

Why are the outputs so different ? They are doing the same thing no ?

Regards

Cameron

On Wed, 11 Mar 2020 at 17:47, Vladimir Mikryukov notifications@github.com wrote:

Hello Cameron!

It is difficult to say why it's not working without seeing the data. I think that it could be because of the small number of reads in some samples (probably after rarefaction). Could you post the results of:

sample_sums(p.b.a.m.s.lab) taxa_are_rows(p.b.a.m.s.lab)

With best regards, Vladimir

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/vmikk/metagMisc/issues/13?email_source=notifications&email_token=AOZHHWAX5I4HFMJ4GVUYDSDRG5MZHA5CNFSM4LFCTKR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOO3CIA#issuecomment-597537056, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOZHHWEP7LA3QIANFRNPN63RG5MZHANCNFSM4LFCTKRQ .

vmikk commented 4 years ago

Hello Cameron,

By default, phyloseq_mult_raref does not remove OTUs with zero abundance (trimOTUs = FALSE). So you may remove these OTUs after the averaging:

prune_taxa(taxa_sums(p.rf) > 0, p.rf)

Please let me know if it works for you. With best regards, Vladimir

Salineraptor commented 4 years ago

Hi Vladimir

That was my theory. Thanks for the quick response. But i still get this;

p.rf<-phyloseq_mult_raref_avg(p.T.O, SampSize = 10000, MinSizeTreshold = 10000, iter = 100, replace = T)

phyloseq-class experiment-level object

otu_table() OTU Table: [ 7943 taxa and 34 samples ]

sample_data() Sample Data: [ 34 samples by 20 sample variables ]

tax_table() Taxonomy Table: [ 7943 taxa by 7 taxonomic ranks ]

p.rf.correct<-prune_taxa(taxa_sums(p.rf) > 0, p.rf)

phyloseq-class experiment-level object

otu_table() OTU Table: [ 7943 taxa and 34 samples ]

sample_data() Sample Data: [ 34 samples by 20 sample variables ]

tax_table() Taxonomy Table: [ 7943 taxa by 7 taxonomic ranks ]

p.rf.1<-rarefy_even_depth(p.T.O, sample.size = 10000, rngseed = T, replace = TRUE, trimOTUs = T, verbose = TRUE)

phyloseq-class experiment-level object

otu_table() OTU Table: [ 5460 taxa and 34 samples ]

sample_data() Sample Data: [ 34 samples by 20 sample variables ]

tax_table() Taxonomy Table: [ 5460 taxa by 7 taxonomic ranks ]

p.rf.1.correct<-prune_taxa(taxa_sums(p.rf.1) > 0, p.rf.1)

phyloseq-class experiment-level object

otu_table() OTU Table: [ 5460 taxa and 34 samples ]

sample_data() Sample Data: [ 34 samples by 20 sample variables ]

tax_table() Taxonomy Table: [ 5460 taxa by 7 taxonomic ranks ]

NB > p.T.O phyloseq-class experiment-level object otu_table() OTU Table: [ 7981 taxa and 36 samples ] sample_data() Sample Data: [ 36 samples by 20 sample variables ] tax_table() Taxonomy Table: [ 7981 taxa by 7 taxonomic ranks ]

Regards Cameron

On Thu, 12 Mar 2020 at 12:29, Vladimir Mikryukov notifications@github.com wrote:

Hello Cameron,

By default, phyloseq_mult_raref does not remove OTUs with zero abundance (trimOTUs = FALSE). So you may remove these OTUs after the averaging:

prune_taxa(taxa_sums(p.rf) > 0, p.rf)

Please let me know if it works for you. With best regards, Vladimir

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/vmikk/metagMisc/issues/13#issuecomment-598000477, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOZHHWCPTEX2KDGJZIRP7M3RHBQKHANCNFSM4LFCTKRQ .

vmikk commented 4 years ago

Maybe you can me send me your phyloseq object and I'll take a look why it doesn't work as expected? Just remove the metadata and anonymize or shuffle the labels.

Salineraptor commented 4 years ago

Should be there. Help.zip

vmikk commented 4 years ago

This discrepancy in the number of observed OTUs is due to the large number of OTUs with very small relative abundance (<= 0.054%). So when you rarefy data multiple times, there is a small probability that rare OTUs will be present in some iterations, but not in the others. After the averaging, abundance of these OTUs will be very small (not zero!), so they will remain in the OTU table.

We can find these OTUs:

# Remove taxonomy table to speed up psmelt
p.rf@tax_table <- NULL
p.rf.1@tax_table <- NULL

# Convert p.rf.1 to relative abundances
p.rf.1 <- transform_sample_counts(p.rf.1, function(x) x / sum(x) )

multr <- psmelt(p.rf)
singr <- psmelt(p.rf.1)

# Compare OTU abundances in p.rf & p.rf.1
compare <- multr
compare$Samp_OTU <- with(compare, interaction(Sample, OTU))
singr$Samp_OTU <- with(singr, interaction(Sample, OTU))
compare$Abundance_R1 <- singr[match(x = compare$Samp_OTU, table = singr$Samp_OTU), "Abundance"]
compare$Abundance_R1[ is.na(compare$Abundance_R1) ] <- 0
compare <- compare[-which(compare$Abundance == 0 & compare$Abundance_R1 == 0), ]

ggplot(data = compare, aes(x = Abundance_R1, y = Abundance)) + geom_point() + 
  labs(x = "Single rarefaction", y = "Averaged across multiple rarefactions")

# Extract OTUs that are missing in single-rarefied data, but present in multiple rarefactions
diffs <- compare[ compare$Abundance_R1 == 0, ]
length(unique(diffs$OTU))

summary(diffs$Abundance)
#      Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
# 0.00000100 0.00001800 0.00003400 0.00005179 0.00006525 0.00054000

# Here is a long tail of rare OTUs which were absent in single-rarefied data
ggplot(data = compare, aes(x = Abundance_R1, y = Abundance)) + geom_point() + 
  labs(x = "Single rarefaction", y = "Averaged across multiple rarefactions") +
  ylim(c(0, max(diffs$Abundance)))

Relative abundances of single rarefaction iteration vs averaged across multiple rarefaction iterations: Raref_compare

Tail with rare OTUs which were absent in single-rarefied data: Raref_tail