phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
im encountering a similar issue to #1681. After completing the Dada2 pipeline according to the tutorial, i created a phyloseq object, but in doing so i loose reads in almost all my samples. Contrary to issue #1681 none of my samples go to 0 reads but some loose over 50%. I also tried @Rosiekstein solution to that problem, but it didnt work for me.
Here is an excerpt from my track_reads file with added entry for when i create my phyloseq object:
Hello,
im encountering a similar issue to #1681. After completing the Dada2 pipeline according to the tutorial, i created a phyloseq object, but in doing so i loose reads in almost all my samples. Contrary to issue #1681 none of my samples go to 0 reads but some loose over 50%. I also tried @Rosiekstein solution to that problem, but it didnt work for me.
Here is an excerpt from my track_reads file with added entry for when i create my phyloseq object:
sample | input | filtered | dadaF | dadaR | merged | nonchim | finalPercReadsKept % | reads physeq obj ArcB1 | 82650 | 47038 | 45576 | 46265 | 43855 | 42770 | 51.7 | 42770 AuB1 | 90559 | 56066 | 54249 | 55233 | 52257 | 50278 | 55.5 | 32479 AuB2 | 100718 | 62615 | 59976 | 60933 | 57438 | 55009 | 54.6 | 53549 AuB3 | 137090 | 81540 | 80356 | 80983 | 79213 | 76150 | 55.5 | 27221
There is a large iscrepancy between the seqtab. nochim reads and the phyloseq object.
Here is the code i use to create my phyloseq object:
Read sample Metadata and sample names
sample_metadata <- read.table("dna_metadata.csv", header=TRUE, sep=",", stringsAsFactors=FALSE) rownames(sample_metadata) <- sample_metadata$sample_id head(sample_metadata)
Read asv and taxa files
asv_counts_table <- readRDS("seqtab_nochim.rds") asv_seqs <- colnames(asv_counts_table) asv_headers <- vector(dim(asv_counts_table)[2], mode="character") for (i in 1:dim(asv_counts_table)[2]) { asvheaders[i] <- paste("ASV", i, sep="") } colnames(asv_counts_table) <- asv_headers head(asv_counts_table) class(asv_counts_table) sample_id <- sample_metadata$sample_id rownames(asv_counts_table) <- sample_id
load taxa file
taxa_withspecies <- read.table("taxa.csv", header=TRUE, row.names=1, sep=",", stringsAsFactors=FALSE) head(taxa_with_species) class(taxa_with_species) taxa_with_species <- as.matrix(taxa_with_species)
Create phyloseq object
physeq_ps137 <- phyloseq(otu_table(asv_counts_table, taxa_are_rows=FALSE), tax_table(taxa_with_species), sample_data(sample_metadata))
physeq_ps137 readcount(physeq_ps137)
And this is the code i used to create the seqtab.nochim table in the Dada2 pipeline:
seqtab <- makeSequenceTable(mergers)
output_directory="/Analysis/Pipeline"
saveRDS(mergers, "/Analysis/RData/mergers.rds")
save.image(file.path(output_directory, "mergers.RData"))
Inspect the merger data.frame from the first sample
head(mergers)
class(mergers)
dim(seqtab)
Inspect distribution of sequence lengths
table(nchar(getSequences(seqtab)))
seqtab.nochim <- removeBimeraDenovo(seqtab, method="consensus", multithread=TRUE, verbose=TRUE) # minFoldParentOverAbundance = 8, multithread= parallel::detectCores(),
table(nchar(getSequences(seqtab.nochim)))
dim(seqtab.nochim)
sum(seqtab.nochim)/sum(seqtab)
saveRDS(seqtab, "/Analysis/RData/seqtab.rds")
saveRDS(seqtab.nochim, "/Analysis/RData/seqtab_nochim.rds")
Any idea what could cause the reads to disappear? Thanks in advance!