joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
582 stars 187 forks source link

importing files from qiime #167

Closed CarlyMuletzWolz closed 11 years ago

CarlyMuletzWolz commented 11 years ago

Hello,

I am trying to bring in otu_table.biom, mapping file = SMP_Run1_Map.txt, and a tree rep_set.tre

otufile = "otu_table.biom"
mapfile = "SMP_run1_Map.txt"
treefile = "rep_set.tre"

run1 <- import_qiime (otufile, mapfile, treefile)

but when I run the the import_qiime command this is what I get

Processing map file...
Processing otu/tax file...

Reading and parsing file in chunks ... Could take some time. Please be patient...

Building OTU Table in chunks. Each chunk is one dot.
.
Error in `colnames<-`(`*tmp*`, value = character(0)) : 
  attempt to set colnames on object with less than two dimensions

what am I doing wrong?

I can import the biom file with

import_biom(otufile)

and get this:

phyloseq-class experiment-level object
OTU Table:          [2592 species and 28 samples]
                     species are rows
Taxonomy Table:     [2592 species by 6 taxonomic ranks]:

I can import the mapping file

import_qiime_sampleData(mapfile)

and see all of my mapping data

How do I import my tre file? I tried this with no success

import_qiime(otufilename=F, mapfilename=F, treefile)

In my 454 run I had one control that came back with no sequences and I don't know if that is causing an issue (zeros in the biom file??) in the import_qiime function?

I had to manually add a blank line in the biom file so that it didn't have an 'incomplete' line or R would not import it

But by bringing in these files separately I realize that I am missing commands that are built into the function

import_qiime 

I can do the import_qiime function with the sample obesity data set so I assume there is something wrong with my files that I am too much of a rookie to figure out. After a day of searching and trying to troubleshoot on my own I now pass my issue on to an expert!

Thanks - Carly

CarlyMuletzWolz commented 11 years ago

Haha...it seems that by writing things out you help yourself think about it. After looking more through the phyloseq_basics.pdf I figured out these commands

file <-import_biom(otufile) map <-import_qiime_sampleData(mapfile) treefile <- read.tree("rep_set.tre")

run1 <-merge_phyloseq(file,map,treefile) run1

And it looks pretty good:

phyloseq-class experiment-level object OTU Table: [2392 species and 28 samples] species are rows Sample Data: [28 samples by 11 sample variables]: Taxonomy Table: [2392 species by 6 taxonomic ranks]: Phylogenetic Tree: [2392 tips and 2390 internal nodes] unrooted

I'm sure you will hear more from me. Thanks so far! Any comments are always welcome.

CarlyMuletzWolz commented 11 years ago

Ok, so now I am at work on a Mac and I can no longer import the biom file.

file <- import_biom(otufile) Error in if (taxaPrefix == "greengenes") { : argument is of length zero

Also when I was trying to deal with the run1 dataframe I created last night I was having all types of issues

such as there are not taxonomic names in the biom file, like Phylum, Class, etc but ta1, ta2, etc.

I followed your tutorial on how to do it for the HMP example, but that did not resolve the names still being called ta1, ta2, etc, which I do not know what they represent.

Also I was getting an error when I was trying to do taxaplot, but can't remember what it is now.

Anyway please help me get my files into R so I can do awesome things with the data. The import_qiime function does not work either on the Mac version at least with the files that I have that were created with QIIME 1.6.0

CarlyMuletzWolz commented 11 years ago

Trying this too from R help: file <- import_biom(otufile, 'greengenes', parallel = TRUE)

Error in taxmat[i, 1:length(x$rows[[i]]$metadata$taxonomy)] <- parseGreenGenesPrefix(x$rows[[i]]$metadata$taxonomy) : number of items to replace is not a multiple of replacement length

CarlyMuletzWolz commented 11 years ago

this works

file <- import_biom(otufile, taxaPrefix = FALSE, parallel = TRUE) str(file)

Formal class 'phyloseq' [package "phyloseq"] with 4 slots ..@ otu_table:Formal class 'otutable' [package "phyloseq"] with 2 slots .. .. ..@ .Data : num [1:2592, 1:28] 2 0 0 0 3 0 1 0 0 2 ... .. .. .. ..- attr(, "dimnames")=List of 2 .. .. .. .. ..$ : chr [1:2592] "0" "1" "2" "3" ... .. .. .. .. ..$ : chr [1:28] "38B5" "THA13" "SFA2" "THA6" ... .. .. ..@ taxa_are_rows: logi TRUE ..@ taxtable:Formal class 'taxonomyTable' [package "phyloseq"] with 1 slots .. .. ..@ .Data: chr [1:2592, 1:6] "Root" "Root" "Root" "Root" ... .. .. .. ..- attr(, "dimnames")=List of 2 .. .. .. .. ..$ : chr [1:2592] "0" "1" "2" "3" ... .. .. .. .. ..$ : chr [1:6] "ta1" "ta2" "ta3" "ta4" ... ..@ sam_data : NULL ..@ phy_tree : NULL

but in the example data set in tax_table the second chr [1:6] has Root, Phylum, Class, etc. while here there is ta1, etc. what do they mean? Also I noticed that the first chr last night had different numbers there then 0, 1, 2, 3. I thought those numbers were signifying how many OTUs were in each Phylum, Class, etc.

joey711 commented 11 years ago

Wow, a lot of questions to address.

Of course if you have extra data not originally included in the map file to QIIME, and so not in the biom file, then merge_phyloseq is still a good option for adding it, as you were showing in the example above.

I have included in phyloseq a more robust tree-importing wrapper function called read_tree that I recommend using.

I have been working with the biom-format gurus to include support for having a tree included in the biom-format itself, but this is not yet implemented. In the meantime, it will be very easy for me to add an additional optional tree argument to the import_biom function so that you do not need to mess with merge_phyloseq and the pesky extra line or two of code that entails.

Hope these comments are helpful, and thank you very much for the feedback. It helps guide updates and especially what details or clarity I need to add to the tutorials.

Joey

joey711 commented 11 years ago

I almost forgot!

joey711 commented 11 years ago

The tree option has now been added to the latest version. 1.3.11+

CarlyMuletzWolz commented 11 years ago

Thank you for the responses! That helped clarify some of my issues.

I installed the newest version of phyloseq by running the command:

source("http://bioconductor.org/biocLite.R") biocLite("phyloseq")

and as I always get critiqued for my uninformative names I have renamed the biom file myData as you suggested.

1) Again when I try to use the import_biom command as is it does not work

otufile = "otu_table1.biom" myData <- import_biom(otufile)

Error in if (taxaPrefix == "greengenes") { : argument is of length zero

BUT this command works

myData <- import_biom(otufile, taxaPrefix = FALSE, parallel = TRUE)

why do I have to set the taxaPrefix to false? I chose to do this because there seemed like there is some issue with it based on the error message?

2) I am also confused at the import_biom command. You make it sound like with this command you bring in your mapping file information also. This does not seem to be the case. If I look at the biom file alone it has this after I run the command from 1) above

myData phyloseq-class experiment-level object OTU Table: [2592 taxa and 28 samples] taxa are rows Taxonomy Table: [2592 taxa by 6 taxonomic ranks]:

I have to bring in the mapping file information and the tree file information separately. The import_qiime function should do all of these things together, but this command does not work for me either.

otufile = "otu_table1.biom" mapfile ="SMP_run1_Map.txt" treefile = "rep_set.tre" run1 <- import_qiime(otufile, mapfile, treefile) Processing map file... Processing otu/tax file...

Reading and parsing file in chunks ... Could take some time. Please be patient...

Building OTU Table in chunks. Each chunk is one dot. . Error in colnames<-(*tmp*, value = character(0)) : attempt to set colnames on object with less than two dimensions

Why doesn't this command work for me?

3) Instead I run these lines and I can get them all together now as run1

mapfile <-import_qiime_sampleData("SMP_run1_Map.txt") treefile <- read.tree("rep_set.tre") run1 <-merge_phyloseq(myData,mapfile,treefile)

phyloseq-class experiment-level object OTU Table: [2392 taxa and 28 samples] taxa are rows Sample Data: [28 samples by 11 sample variables]: Taxonomy Table: [2392 taxa by 6 taxonomic ranks]: Phylogenetic Tree: [2392 tips and 2390 internal nodes] unrooted

Why is the tree unrooted? I brought the tree over from QIIME as instructed. Should I contact QIIME about this issue?

4) So I find out it isn't rooted and then I run this command I found in a HMP help dataset on this forum

is.rooted(tre(run1)) FALSE tre2 <- root(tre(run1), sample(species.names(run1), 1), resolve.root = TRUE)

then I bring this new tree in to my run1 file

run1 <-merge_phyloseq(myData,mapfile,tre2)

is.rooted(tre(run1)) TRUE

What am I even doing here? Basically this relates to question 3.

5) I was able to assign the rank.names as the taxonomic classifications - thanks for the help!

Again, thanks for the reply. I would prefer to use R to do these metagenomic analyses so help with these issues are greatly appreciated. I attended the workshop you held at the UW-Seattle. It was helpful and I have been reviewing my notes and the command lines you provided. It seems though that several commands have been updated or changed since the workshop?

Cheers - Carly

joey711 commented 11 years ago

That is not the latest version of phyloseq. The code you showed for installation will install what we call the "stable release version", which updates only twice a year, and is actually 3+ months old now. Please see the phyloseq installation tutorial for installing the latest development version from this GitHub repo.

Glad the workshop was helpful. Sorry if some of those example commands are outdated. Can you post the biggest discrepancies that you've come across at the Issue Tracker for the phyloseq demo? That will help me fix it much faster.

Thanks for the feedback, as always

joey

CarlyMuletzWolz commented 11 years ago

When I try to update phyloseq to the github version I get an error. Here are my command lines:

source("http://bioconductor.org/biocLite.R") biocLite("phyloseq") install.packages("devtools") library("devtools") install_github("phyloseq", "joey711")

All work until I get to the install_github command. I get the error message below

install_github("phyloseq", "joey711")

Installing github repo(s) phyloseq/master from joey711 Installing phyloseq.zip from https://api.github.com/repos/joey711/phyloseq/zipball/master Installing phyloseq /Library/Frameworks/R.framework/Resources/bin/R --vanilla CMD build \ '/private/var/folders/dg/clpwjnzx45v0b0t6r1v3ytsm0000gq/T/Rtmpr4n5et/joey711-phyloseq-e34e8aa' \ --no-manual --no-resave-data

Attaching package: 'ade4'

The following object(s) are masked from 'package:base':

within

Loading required package: picante Loading required package: ape Loading required package: vegan Loading required package: permute This is vegan 2.0-5

Attaching package: 'vegan'

The following object(s) are masked from 'package:ade4':

cca

Loading required package: nlme Warning: "legend" argument in scale_XXX is deprecated. Use guide="none" for suppress the guide display. (Deprecated; last used in version 0.8.9) Warning: "legend" argument in scale_XXX is deprecated. Use guide="none" for suppress the guide display. (Deprecated; last used in version 0.8.9) Warning: "legend" argument in scale_XXX is deprecated. Use guide="none" for suppress the guide display. (Deprecated; last used in version 0.8.9) Loading required package: plyr

Attaching package: 'reshape'

The following object(s) are masked from 'package:plyr':

rename, round_any

Warning: "legend" argument in scale_XXX is deprecated. Use guide="none" for suppress the guide display. (Deprecated; last used in version 0.8.9) Warning: Removed 1 rows containing missing values (geom_text). Warning: Removed 1 rows containing missing values (geom_text). Error in texi2dvi(file = file, pdf = TRUE, clean = clean, quiet = quiet, : Running 'texi2dvi' on 'phyloseq_analysis.tex' failed. Calls: -> texi2pdf -> texi2dvi Execution halted Error: Command failed (1)

joey711 commented 11 years ago

Looks like your system doesn't have a working version of latex, so it fails to rebuild the vignettes during installation. I'm not sure why R packages have to do that by default when building from source. The PDF files for the vignettes are already included, and it looks like your installation would otherwise work.

You have two clear options off the top of my head.

temp <- tempfile()
macURL = "http://bioconductor.org/packages/devel/bioc/bin/macosx/leopard/contrib/2.16/phyloseq_1.3.11.tgz"
download.file(macURL, temp)
install.packages(temp, repos = NULL, type = "mac.binary.leopard")
joey711 commented 11 years ago

Also, this latest comment is a real and separate issue, that you should probably post anew with a different title. I think other users might benefit from seeing comments about installation issues.

Paula0666 commented 7 years ago

Hey, I'm importing the mapping file just fine, and the tree as well. But I'm having problems with the biom file. I added the metadata using biom --add metadata, but when I imported it into R, I got the following: import_biom(otufile) phyloseq-class experiment-level object otu_table() OTU Table: [ 27566 taxa and 24 samples ] sample_data() Sample Data: [ 24 samples by 4 sample variables ] tax_table() Taxonomy Table: [ 27566 taxa by 7 taxonomic ranks ] Warning message: In strsplit(msg, "\n") : input string 1 is invalid in this locale

I tried several times, and I have no idea how to solve this. Any thoughts? Thanks