joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
567 stars 187 forks source link

Missing reads creating phyloseq object after dada2 pipeline #1746

Open GingerMATE opened 2 months ago

GingerMATE commented 2 months ago

Hello,

im encountering a similar issue to #1681. After completing the Dada2 pipeline according to the tutorial, i created a phyloseq object, but in doing so i loose reads in almost all my samples. Contrary to issue #1681 none of my samples go to 0 reads but some loose over 50%. I also tried @Rosiekstein solution to that problem, but it didnt work for me.

Here is an excerpt from my track_reads file with added entry for when i create my phyloseq object:

sample | input | filtered | dadaF | dadaR | merged | nonchim | finalPercReadsKept % | reads physeq obj ArcB1 | 82650 | 47038 | 45576 | 46265 | 43855 | 42770 | 51.7 | 42770 AuB1 | 90559 | 56066 | 54249 | 55233 | 52257 | 50278 | 55.5 | 32479 AuB2 | 100718 | 62615 | 59976 | 60933 | 57438 | 55009 | 54.6 | 53549 AuB3 | 137090 | 81540 | 80356 | 80983 | 79213 | 76150 | 55.5 | 27221

There is a large iscrepancy between the seqtab. nochim reads and the phyloseq object.

Here is the code i use to create my phyloseq object:

Read sample Metadata and sample names

sample_metadata <- read.table("dna_metadata.csv", header=TRUE, sep=",", stringsAsFactors=FALSE) rownames(sample_metadata) <- sample_metadata$sample_id head(sample_metadata)

Read asv and taxa files

asv_counts_table <- readRDS("seqtab_nochim.rds") asv_seqs <- colnames(asv_counts_table) asv_headers <- vector(dim(asv_counts_table)[2], mode="character") for (i in 1:dim(asv_counts_table)[2]) { asvheaders[i] <- paste("ASV", i, sep="") } colnames(asv_counts_table) <- asv_headers head(asv_counts_table) class(asv_counts_table) sample_id <- sample_metadata$sample_id rownames(asv_counts_table) <- sample_id

load taxa file

taxa_withspecies <- read.table("taxa.csv", header=TRUE, row.names=1, sep=",", stringsAsFactors=FALSE) head(taxa_with_species) class(taxa_with_species) taxa_with_species <- as.matrix(taxa_with_species)

Create phyloseq object

physeq_ps137 <- phyloseq(otu_table(asv_counts_table, taxa_are_rows=FALSE), tax_table(taxa_with_species), sample_data(sample_metadata))

physeq_ps137 readcount(physeq_ps137)

And this is the code i used to create the seqtab.nochim table in the Dada2 pipeline:

seqtab <- makeSequenceTable(mergers)

output_directory="/Analysis/Pipeline"

saveRDS(mergers, "/Analysis/RData/mergers.rds")

save.image(file.path(output_directory, "mergers.RData"))

Inspect the merger data.frame from the first sample

head(mergers)

class(mergers)

dim(seqtab)

Inspect distribution of sequence lengths

table(nchar(getSequences(seqtab)))

seqtab.nochim <- removeBimeraDenovo(seqtab, method="consensus", multithread=TRUE, verbose=TRUE) # minFoldParentOverAbundance = 8, multithread= parallel::detectCores(),

table(nchar(getSequences(seqtab.nochim)))

dim(seqtab.nochim)

sum(seqtab.nochim)/sum(seqtab)

saveRDS(seqtab, "/Analysis/RData/seqtab.rds")

saveRDS(seqtab.nochim, "/Analysis/RData/seqtab_nochim.rds")

Any idea what could cause the reads to disappear? Thanks in advance!

GingerMATE commented 2 months ago

I also dont fully understand what exactly the readcount function does. Maybe thats my mistake?