joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
582 stars 187 forks source link

Error when calculating prevalence of taxa (arguments imply differing number of rows) #1579

Open emankhalaf opened 2 years ago

emankhalaf commented 2 years ago

Hi,

I generated the phyloseq object from DADA2 pipeline, and when calculating the prevalence using this code:

`prevelancedf = apply(X = otu_table(ps), MARGIN = 1, FUN = function(x){sum(x > 0)})

Add taxonomy and total read counts to this data.frame

prevelancedf = data.frame(Prevalence = prevelancedf, TotalAbundance = taxa_sums(ps), tax_table(ps)) prevelancedf[1:10,]`

I got this error: Error in data.frame(Prevalence = prevelancedf, TotalAbundance = taxa_sums(pollen_working), : arguments imply differing number of rows: 54, 565

However, the same code is working for phyloseq object from Qiime pipeline! Any suggestions, please? Thanks!

mweberr commented 2 years ago

Hi, make sure the phyloseq object pollen_working has the correct number taxa like prevalencedf.

emankhalaf commented 2 years ago

@MichWeb75 Hi, Ok, in the code above I subset the phyloseq object. So, I re-ran the code using the complete object as follows: ## Compute prevalence of each feature (number of samples each taxon occurs in), store as data.frame prevdf = apply(X = otu_table(ps), MARGIN = 1, FUN = function(x){sum(x > 0)}) head(prevdf)

P1 P2 P4b P5 P6 P7 92 89 151 110 117 121

Then # Add taxonomy and total read counts to this data.frame prevdf = data.frame(Prevalence = prevdf, TotalAbundance = taxa_sums(ps), tax_table(ps)) prevdf[1:10,]

I still have the same error: Error in data.frame(Prevalence = prevdf, TotalAbundance = taxa_sums(ps), : arguments imply differing number of rows: 56, 813

The same code worked well on phyloseq object assembled from qiime pipeline, however, this object is consolidated from DADA2 pipeline in R.

So, I am not sure what is the problem here? As far as I know, the major difference is that the otu-table from qiime, the ASVs are assigned to feature-IDs whereas from DADA2 pipeline in R, the features are recognized as unique sequences that should be assigned to ASVs (ASV1,2,....) downstream analysis. Any recommendations here, please?

Thanks!