joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
582 stars 187 forks source link

qiime rdp classifier biom import failure #392

Closed ghanesh closed 8 years ago

ghanesh commented 10 years ago

Hi Joey! I commented on #272, but this issue might be worth an own thread.... Thanks for the details on solving the error in #272 . It would be great if you could help me with following problem to import my files from qiime in phyloseq! Thank you in advance! I have a similar problem, but it is only partly solved by this thread. The files result from qiime pipeline, where we used the rdp classifier. It was not possible to import the biom file directly. I believe it because of the way how the rdp classifier is used in qiime. Well, what worked so far was:

otufile = "otu_table_uclust.biom" mapfile = "map_bac_all_corrected.txt" trefile = "rep_set.tre" read_biom(otufile) biom object. type: OTU table matrix_type: sparse 1149 rows and 1504 columns x = read_biom(otufile) x = read_biom(otufile) otumat = as(biom_data(x), "matrix") OTU = otu_table(otumat, taxa_are_rows=TRUE) taxmat = as.matrix(observation_metadata(x), rownames.force=TRUE, byrows=FALSE)

BUT here the problems start This command produces a List with one column only: row.names V1 1 denovo0 Bacteria 2 denovo1 Bacteria 3 denovo2 c("Bacteria", "Proteobacteria", "Alphaproteobacteria", "Rickettsiales", "SAR11", "Pelagibacter") 4 denovo3 c("Bacteria", "Bacteroidetes", "Flavobacteria", "Flavobacteriales", "Flavobacteriaceae") 5 denovo4 c("Bacteria", "Proteobacteria")

so therefore the command

TAX = tax_table(taxmat) produces following:

Error in validObject(.Object) : invalid class “taxonomyTable” object: Non-character matrix provided as Taxonomy Table.

Taxonomy is expected to be characters.

I believe this might be the reason, why I got stuck with using: MyExp <- import_qiime(otufile, mapfile, trefile) in the first place:

Error in fread(input = paste0(x, collapse = "\n"), sep = "\t", header = TRUE, : 'skip' must be a length 1 vector of type numeric or integer >=-1, or single character search string In addition: Warning messages: 1: In readLines(file) : incomplete final line found on 'otu_table_uclust.biom' 2: In max(which(substr(x[1:25L], 1, 1) == "#")) : no non-missing arguments to max; returning -Inf

3: running command 'C:\Windows\system32\cmd.exe /c ({"id": "None","format": "Biological Observation Matrix 1.0.0","format_url": "http://biom-format.org","type": "OTU table","generated_by": "QIIME 1.7.0-dev","date": "2014-10-09T16:21:41.267897","matrix_type": "sparse","matrix_element_type": "int","shape": [1149, 1504],"data":

Any ideas? Cheers! Alexander

ghanesh commented 10 years ago

maybe this problem is also linked to #357....

ghanesh commented 9 years ago

Has anyone an idea?

joey711 commented 9 years ago

@ghanesh / Alexaner, did you solve this issue yet? If so, what did it?

What version of biom-format was this file? Version 1 (JSON) or Version 2 (HDF5)?

Thanks and sorry for the delay.

joey

ghanesh commented 9 years ago

@joey711 Hello, Well I was able to circumvent the prroblem, which I did via: otufile = "otu_table_w_tax.biom" mapfile = "Map_final.txt" trefile = "arc_rep_set.tre" envir=import_qiime_sample_data(mapfile) myData = import_biom(otufile, trefile) myData = merge_phyloseq(myData,envir)

which gave me this: phyloseq-class experiment-level object otu_table() OTU Table: [ 1949 taxa and 118 samples ] sample_data() Sample Data: [ 118 samples by 28 sample variables ] tax_table() Taxonomy Table: [ 1949 taxa by 7 taxonomic ranks ] phy_tree() Phylogenetic Tree: [ 1949 tips and 1947 internal nodes ]

with the following: Taxonomy Table: [6 taxa by 7 taxonomic ranks]: Rank1 Rank2 Rank3 Rank4 Rank5 Rank6
3354502 "kArchaea" "pCrenarchaeota" "cThaumarchaeota" "oCenarchaeales" " Rank7 3354502 "s__"

therefore rank_names(myData) results in:

[1] "Rank1" "Rank2" "Rank3" "Rank4" "Rank5" "Rank6" "Rank7"

I then solved the headers issue with: colnames(tax_table(myData)) = c( "Kingdom", "Phylum", "Class", "Order", "Family", "Genus", "Species") as you suggested in an other reply (Issue 162, if I'm not mistaken). Anyways, what I did not manage to get rid of are the taxonomic level identificators and the "__". I know it should be possible to tweak this, but at the moment not really how. Do you have a suggestion for me?

Thank you for your support! Alex