joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
567 stars 187 forks source link

missing reads after phyloseq() step #1681

Open Rosiekstein opened 1 year ago

Rosiekstein commented 1 year ago

Hello, I'm using dada2 and phyloseq on a supercomputing cluster, with the following Sys.info:

                          sysname                               release 
                          "Linux"       "4.18.0-425.3.1.el8.nci.x86_64" 
                          version                              nodename 

"#1 SMP Mon Jan 9 11:39:52 AEDT 2023" "gadi-cpu-bdw-0090.gadi.nci.org.au" machine login "x86_64" "unknown" user effective_user "rs2207" "rs2207"

and r info:

R version 4.2.2 (2022-10-31) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 22.04.1 LTS

I am trying to create a phyloseq object, but when I do I lose a ton of reads and have several samples that go to 0 reads when they originally had data. I found another post about this issue from 2019 (https://github.com/joey711/phyloseq/issues/1208) but none of the fixes there worked (removing "-" from sample names, etc.). This is defenitely happening in the phyloseq() step, which I tested as follows:

# first, make all the tables outside of phyloseq, see if this helps. this does not help.

seqs <- otu_table(seqtab.nochim, taxa_are_rows = FALSE)
sams <- sample_data(samdf)
tax <- tax_table(taxa)

ps <- phyloseq(seqs,sams,tax)

# make sure sample_names() are all the same

all(sample_names(seqs) == sample_names(sams)) # they're the same
all(sample_names(ps) == sample_names(sams)) # still the same, so R is reading them correctly

# still the same problem! check where the issues are.

all(sample_sums(ps) == sample_sums(seqs)) # not the same
all(sample_sums(seqs) == rowSums(seqtab.nochim)) # are the same, so the issue is at the phyloseq() step

when I look at sample_sums() I get the following before and after the phyloseq() step (this is a subset because I have a lot of samples):

sample_sums(seqs) TerrigalOS.1.1 TerrigalOS.1.2 TerrigalOS.1.3 TerrigalOS.10.1 TerrigalOS.10.2 211602 259091 206717 259981 192357 TerrigalOS.10.3 TerrigalOS.11.1 TerrigalOS.11.2 TerrigalOS.11.3 TerrigalOS.12.1 186554 127 69140 202817 227222

sample_sums(ps) TerrigalOS.1.1 TerrigalOS.1.2 TerrigalOS.1.3 TerrigalOS.10.1 TerrigalOS.10.2 24118 0 69371 2013 30935 TerrigalOS.10.3 TerrigalOS.11.1 TerrigalOS.11.2 TerrigalOS.11.3 TerrigalOS.12.1 26834 2 1183 29767 25790

Does anyone have any suggestions on how to deal with this issue? Thank you.

Rosiekstein commented 1 year ago

I worked it out! If anyone else has this problem, it seems to be an issue with adding the tax_table and otu_table at the same time. So I worked around it by:

# try making a phyloseq object with just otu_table and sam_data

ps.test2 <- phyloseq(otu_table(seqs.test, taxa_are_rows = FALSE), sam_data(sams.test))
all(sample_sums(ps.test2) == rowSums(seqtab.nochim)) # OK, so this works

# add in tax_table to ps.test2
ps.test2@tax_table <- tax_table(tax.test)

all(sample_sums(ps.test2) == rowSums(seqtab.nochim)) # still TRUE, so this worked!