Closed Salineraptor closed 4 years ago
Hello Cameron!
It is difficult to say why it's not working without seeing the data. I think that it could be because of the small number of reads in some samples (probably after rarefaction). Could you post the results of:
sample_sums(p.b.a.m.s.lab)
taxa_are_rows(p.b.a.m.s.lab)
With best regards, Vladimir
Hi Vladimir
Thank you so much for getting back to me.
I do lose two samples after rarefaction functions like https://rdrr.io/github/vmikk/metagMisc/man/phyloseq_mult_raref.html are applied yes. B30 & B31. I lose the same samples if i were to rarefy on the unmerged replicates. So it in theory shouldn't make a difference.
1
#
2
However i still want an average..
sample_sums(p.b.a.m.s.lab)
B1 B2
B3
101185 97908
37062
B4 B5
B6
54592 52062
131014
B7 B8
B9
172053 171042
277328
B10 B11
B12
478035 253444
257516
B13 B14
B15
141324 115771
22225
B16 B17
B18
277429 161581
97891
B19 B20
B21
62709 220285
48663
B22 B23
B24
48705 37352
144526
B25 B26
B27
201021 174101
38829
B28 B29
B30
245844 16787
3236
B31 B32
B33
1802 20673
229191
B34 B35
B36
205454 66717
116608
taxa_are_rows(p.b.a.m.s.lab)
So yes this was the problem: the taxa weren't rows so i transposed the OTU table and it works. However, I now have a new discrepancy. Whats the difference here?
p.rf<-phyloseq_mult_raref_avg(p.T.O, SampSize = 10000,iter = 100, replace = T) ..Multiple rarefaction |=================================================================| 100% ..Sample renaming ..Rarefied data merging ..Splitting by sample ..OTU abundance averaging within rarefaction iterations |=================================================================| 100% ..Re-create phyloseq object
p.rf phyloseq-class experiment-level object otu_table() OTU Table: [ 7943 taxa and 34 samples ] sample_data() Sample Data: [ 34 samples by 20 sample variables ] tax_table() Taxonomy Table: [ 7943 taxa by 7 taxonomic ranks ]
AND THIS
p.rf.1<-rarefy_even_depth(p.T.O, sample.size = 10000,
- rngseed = T, replace = TRUE, trimOTUs = TRUE, verbose = TRUE)
set.seed(TRUE)
was used to initialize repeatable random subsampling. Please record this for your records so others can reproduce. Tryset.seed(TRUE); .Random.seed
for the full vector ... 2 samples removedbecause they contained fewer reads thansample.size
. Up to first five removed samples are:
B30 B31 ... 2521OTUs were removed because they are no longer present in any sample after random subsampling
...
p.rf.1 phyloseq-class experiment-level object otu_table() OTU Table: [ 5460 taxa and 34 samples ] sample_data() Sample Data: [ 34 samples by 20 sample variables ] tax_table() Taxonomy Table: [ 5460 taxa by 7 taxonomic ranks ]
Why are the outputs so different ? They are doing the same thing no ?
Regards
Cameron
On Wed, 11 Mar 2020 at 17:47, Vladimir Mikryukov notifications@github.com wrote:
Hello Cameron!
It is difficult to say why it's not working without seeing the data. I think that it could be because of the small number of reads in some samples (probably after rarefaction). Could you post the results of:
sample_sums(p.b.a.m.s.lab) taxa_are_rows(p.b.a.m.s.lab)
With best regards, Vladimir
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/vmikk/metagMisc/issues/13?email_source=notifications&email_token=AOZHHWAX5I4HFMJ4GVUYDSDRG5MZHA5CNFSM4LFCTKR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOO3CIA#issuecomment-597537056, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOZHHWEP7LA3QIANFRNPN63RG5MZHANCNFSM4LFCTKRQ .
Hello Cameron,
By default, phyloseq_mult_raref
does not remove OTUs with zero abundance (trimOTUs = FALSE
).
So you may remove these OTUs after the averaging:
prune_taxa(taxa_sums(p.rf) > 0, p.rf)
Please let me know if it works for you. With best regards, Vladimir
Hi Vladimir
That was my theory. Thanks for the quick response. But i still get this;
p.rf<-phyloseq_mult_raref_avg(p.T.O, SampSize = 10000, MinSizeTreshold = 10000, iter = 100, replace = T)
p.rf.correct<-prune_taxa(taxa_sums(p.rf) > 0, p.rf)
p.rf.1<-rarefy_even_depth(p.T.O, sample.size = 10000, rngseed = T, replace = TRUE, trimOTUs = T, verbose = TRUE)
p.rf.1.correct<-prune_taxa(taxa_sums(p.rf.1) > 0, p.rf.1)
NB > p.T.O phyloseq-class experiment-level object otu_table() OTU Table: [ 7981 taxa and 36 samples ] sample_data() Sample Data: [ 36 samples by 20 sample variables ] tax_table() Taxonomy Table: [ 7981 taxa by 7 taxonomic ranks ]
Regards Cameron
On Thu, 12 Mar 2020 at 12:29, Vladimir Mikryukov notifications@github.com wrote:
Hello Cameron,
By default, phyloseq_mult_raref does not remove OTUs with zero abundance (trimOTUs = FALSE). So you may remove these OTUs after the averaging:
prune_taxa(taxa_sums(p.rf) > 0, p.rf)
Please let me know if it works for you. With best regards, Vladimir
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/vmikk/metagMisc/issues/13#issuecomment-598000477, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOZHHWCPTEX2KDGJZIRP7M3RHBQKHANCNFSM4LFCTKRQ .
Maybe you can me send me your phyloseq object and I'll take a look why it doesn't work as expected? Just remove the metadata and anonymize or shuffle the labels.
Should be there. Help.zip
This discrepancy in the number of observed OTUs is due to the large number of OTUs with very small relative abundance (<= 0.054%). So when you rarefy data multiple times, there is a small probability that rare OTUs will be present in some iterations, but not in the others. After the averaging, abundance of these OTUs will be very small (not zero!), so they will remain in the OTU table.
We can find these OTUs:
# Remove taxonomy table to speed up psmelt
p.rf@tax_table <- NULL
p.rf.1@tax_table <- NULL
# Convert p.rf.1 to relative abundances
p.rf.1 <- transform_sample_counts(p.rf.1, function(x) x / sum(x) )
multr <- psmelt(p.rf)
singr <- psmelt(p.rf.1)
# Compare OTU abundances in p.rf & p.rf.1
compare <- multr
compare$Samp_OTU <- with(compare, interaction(Sample, OTU))
singr$Samp_OTU <- with(singr, interaction(Sample, OTU))
compare$Abundance_R1 <- singr[match(x = compare$Samp_OTU, table = singr$Samp_OTU), "Abundance"]
compare$Abundance_R1[ is.na(compare$Abundance_R1) ] <- 0
compare <- compare[-which(compare$Abundance == 0 & compare$Abundance_R1 == 0), ]
ggplot(data = compare, aes(x = Abundance_R1, y = Abundance)) + geom_point() +
labs(x = "Single rarefaction", y = "Averaged across multiple rarefactions")
# Extract OTUs that are missing in single-rarefied data, but present in multiple rarefactions
diffs <- compare[ compare$Abundance_R1 == 0, ]
length(unique(diffs$OTU))
summary(diffs$Abundance)
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 0.00000100 0.00001800 0.00003400 0.00005179 0.00006525 0.00054000
# Here is a long tail of rare OTUs which were absent in single-rarefied data
ggplot(data = compare, aes(x = Abundance_R1, y = Abundance)) + geom_point() +
labs(x = "Single rarefaction", y = "Averaged across multiple rarefactions") +
ylim(c(0, max(diffs$Abundance)))
Relative abundances of single rarefaction iteration vs averaged across multiple rarefaction iterations:
Tail with rare OTUs which were absent in single-rarefied data:
Hi all
I've an issue with the phyloseq_mult_raref_avg function; it works on this phyloseq object. phyloseq_summary(ps, cols = NULL, more_stats = FALSE,
Parameter Phys1
1 Number of samples 108.0000
2 Number of OTUs 7981.0000
3 Total number of reads 4781965.0000
4 Average number of reads per OTU 599.1687
5 Average number of reads per sample 44277.4537
works<-phyloseq_mult_raref_avg(ps,replace = T, SampSize = 10000, iter = 3) ..Multiple rarefaction |=====================================================================================| 100% ..Sample renaming ..Rarefied data merging ..Splitting by sample ..OTU abundance averaging within rarefaction iterations |=====================================================================================| 100% ..Re-create phyloseq object
But not this phyloseq object;
phyloseq_summary(p.b.a.m.s.lab, cols = NULL, more_stats = FALSE, long = FALSE)
Parameter Phys1
1 Number of samples 36.0000
2 Number of OTUs 7981.0000
3 Total number of reads 4781965.0000
4 Average number of reads per OTU 599.1687
5 Average number of reads per sample 132832.3611
fails<-phyloseq_mult_raref_avg(p.b.a.m.s.lab,,replace = T, SampSize = 10000, iter = 3) ..Multiple rarefaction |=====================================================================================| 100% ..Sample renaming ..Rarefied data merging ..Splitting by sample Error in validObject(.Object) : invalid class “otu_table” object: OTU abundance data must have non-zero dimensions.
validotu_table(otu_table(p.b.a.m.s.lab)) [1] TRUE sum(is.na(otu_table(p.b.a.m.s.lab))) [1] 0
I've psmelted it etc and all looks good no irregularities. Makes zero sense. phyloseq_mult_raref works on both...
Regards Cameron