Closed CarlyMuletzWolz closed 11 years ago
Haha...it seems that by writing things out you help yourself think about it. After looking more through the phyloseq_basics.pdf I figured out these commands
file <-import_biom(otufile) map <-import_qiime_sampleData(mapfile) treefile <- read.tree("rep_set.tre")
run1 <-merge_phyloseq(file,map,treefile) run1
And it looks pretty good:
phyloseq-class experiment-level object OTU Table: [2392 species and 28 samples] species are rows Sample Data: [28 samples by 11 sample variables]: Taxonomy Table: [2392 species by 6 taxonomic ranks]: Phylogenetic Tree: [2392 tips and 2390 internal nodes] unrooted
I'm sure you will hear more from me. Thanks so far! Any comments are always welcome.
Ok, so now I am at work on a Mac and I can no longer import the biom file.
file <- import_biom(otufile) Error in if (taxaPrefix == "greengenes") { : argument is of length zero
Also when I was trying to deal with the run1 dataframe I created last night I was having all types of issues
such as there are not taxonomic names in the biom file, like Phylum, Class, etc but ta1, ta2, etc.
I followed your tutorial on how to do it for the HMP example, but that did not resolve the names still being called ta1, ta2, etc, which I do not know what they represent.
Also I was getting an error when I was trying to do taxaplot, but can't remember what it is now.
Anyway please help me get my files into R so I can do awesome things with the data. The import_qiime function does not work either on the Mac version at least with the files that I have that were created with QIIME 1.6.0
Trying this too from R help: file <- import_biom(otufile, 'greengenes', parallel = TRUE)
Error in taxmat[i, 1:length(x$rows[[i]]$metadata$taxonomy)] <- parseGreenGenesPrefix(x$rows[[i]]$metadata$taxonomy) : number of items to replace is not a multiple of replacement length
this works
file <- import_biom(otufile, taxaPrefix = FALSE, parallel = TRUE) str(file)
Formal class 'phyloseq' [package "phyloseq"] with 4 slots ..@ otu_table:Formal class 'otutable' [package "phyloseq"] with 2 slots .. .. ..@ .Data : num [1:2592, 1:28] 2 0 0 0 3 0 1 0 0 2 ... .. .. .. ..- attr(, "dimnames")=List of 2 .. .. .. .. ..$ : chr [1:2592] "0" "1" "2" "3" ... .. .. .. .. ..$ : chr [1:28] "38B5" "THA13" "SFA2" "THA6" ... .. .. ..@ taxa_are_rows: logi TRUE ..@ taxtable:Formal class 'taxonomyTable' [package "phyloseq"] with 1 slots .. .. ..@ .Data: chr [1:2592, 1:6] "Root" "Root" "Root" "Root" ... .. .. .. ..- attr(, "dimnames")=List of 2 .. .. .. .. ..$ : chr [1:2592] "0" "1" "2" "3" ... .. .. .. .. ..$ : chr [1:6] "ta1" "ta2" "ta3" "ta4" ... ..@ sam_data : NULL ..@ phy_tree : NULL
but in the example data set in tax_table the second chr [1:6] has Root, Phylum, Class, etc. while here there is ta1, etc. what do they mean? Also I noticed that the first chr last night had different numbers there then 0, 1, 2, 3. I thought those numbers were signifying how many OTUs were in each Phylum, Class, etc.
Wow, a lot of questions to address.
file
. I think I would prefer a name like myData
, but anyway the end result is the same. str
is to use the built-in print method by simply typing the name of the data object into the R session, in your case above
file
import_biom
or import_qiime
(or import_
anything) is to attempt to label taxonomic classification data if it is available. The "ta1", "ta2", etc. are dummy labels for the rank-names associated with your taxonomic classification data. In both implementation and conceptualization, these are the column labels on this taxonomy table. The row names are the OTU labels, which I believe was also a question...taxa_names
or sample_names
functions, respectively. The "0" "1" "2" "3"
that you are referring are the OTU names in your data object in that example, presumably because they were simplest-case names in the data. Dummy names of a similar type would be assigned by phyloseq if they were completely missing, but I'm not guessing that is the case here because R likes to start indexing from 1 and not 0. Likely QIIME named the OTUs starting from 0, which is fine here because they are stored as character strings anyway.merge_phyloseq
if you are using standard files with a recent build of QIIME (or recent build of other OTU-clustering pipeline supported by phyloseq, like mothur
). You correctly noticed that the tree is a key exception to that, that I should fix right away. Otherwise, in principle, the sample data of your experiment can and should be stored in the .biom
file, since that is one of the major reasons for using the biom format in the first place (other than the sparse matrix support). So instead of a combined OTU/tax file and a sample data file (or "sample map" according to QIIME), you have just a biom-format file.Of course if you have extra data not originally included in the map file to QIIME, and so not in the biom file, then merge_phyloseq
is still a good option for adding it, as you were showing in the example above.
I have included in phyloseq a more robust tree-importing wrapper function called read_tree
that I recommend using.
I have been working with the biom-format gurus to include support for having a tree included in the biom-format itself, but this is not yet implemented. In the meantime, it will be very easy for me to add an additional optional tree argument to the import_biom
function so that you do not need to mess with merge_phyloseq
and the pesky extra line or two of code that entails.
root
command included by the ape package, which I you will need to load with library(ape)
in addition to loading phyloseq with the library(phyloseq)
command.Hope these comments are helpful, and thank you very much for the feedback. It helps guide updates and especially what details or clarity I need to add to the tutorials.
Joey
I almost forgot!
import_biom
that you are not using the latest version. I have updated both import_biom
and import_qiime
recently so that they can more flexibly process taxonomy classification data. Basically, you have the option of providing a custom parsing function if you have non-standard classification format, or at least one that is not yet supported by phyloseq. The greengenes format with prefixes is therefore not the only thing supported.The tree option has now been added to the latest version. 1.3.11+
Thank you for the responses! That helped clarify some of my issues.
source("http://bioconductor.org/biocLite.R") biocLite("phyloseq")
1) Again when I try to use the import_biom command as is it does not work
otufile = "otu_table1.biom" myData <- import_biom(otufile)
Error in if (taxaPrefix == "greengenes") { : argument is of length zero
myData <- import_biom(otufile, taxaPrefix = FALSE, parallel = TRUE)
2) I am also confused at the import_biom command. You make it sound like with this command you bring in your mapping file information also. This does not seem to be the case. If I look at the biom file alone it has this after I run the command from 1) above
myData phyloseq-class experiment-level object OTU Table: [2592 taxa and 28 samples] taxa are rows Taxonomy Table: [2592 taxa by 6 taxonomic ranks]:
otufile = "otu_table1.biom" mapfile ="SMP_run1_Map.txt" treefile = "rep_set.tre" run1 <- import_qiime(otufile, mapfile, treefile) Processing map file... Processing otu/tax file...
Reading and parsing file in chunks ... Could take some time. Please be patient...
Building OTU Table in chunks. Each chunk is one dot.
.
Error in colnames<-
(*tmp*
, value = character(0)) :
attempt to set colnames on object with less than two dimensions
3) Instead I run these lines and I can get them all together now as run1
mapfile <-import_qiime_sampleData("SMP_run1_Map.txt") treefile <- read.tree("rep_set.tre") run1 <-merge_phyloseq(myData,mapfile,treefile)
phyloseq-class experiment-level object OTU Table: [2392 taxa and 28 samples] taxa are rows Sample Data: [28 samples by 11 sample variables]: Taxonomy Table: [2392 taxa by 6 taxonomic ranks]: Phylogenetic Tree: [2392 tips and 2390 internal nodes] unrooted
4) So I find out it isn't rooted and then I run this command I found in a HMP help dataset on this forum
is.rooted(tre(run1)) FALSE tre2 <- root(tre(run1), sample(species.names(run1), 1), resolve.root = TRUE)
run1 <-merge_phyloseq(myData,mapfile,tre2)
is.rooted(tre(run1)) TRUE
What am I even doing here? Basically this relates to question 3.
5) I was able to assign the rank.names as the taxonomic classifications - thanks for the help!
Again, thanks for the reply. I would prefer to use R to do these metagenomic analyses so help with these issues are greatly appreciated. I attended the workshop you held at the UW-Seattle. It was helpful and I have been reviewing my notes and the command lines you provided. It seems though that several commands have been updated or changed since the workshop?
Cheers - Carly
That is not the latest version of phyloseq. The code you showed for installation will install what we call the "stable release version", which updates only twice a year, and is actually 3+ months old now. Please see the phyloseq installation tutorial for installing the latest development version from this GitHub repo.
taxaPrefix
argument has been replaced, as explained earlier. I also just made an improvement to the new parsing functions so that they can handle a recent problem with QIIME including space characters at the beginning of taxonomic classification entries in .biom
files. Another reason to update your package.import_biom
will import sample data if it is in your file. That is the whole point of the biom format relative to the old format, which already stored the abundance and taxonomic classification data, anyway. If your .biom
file doesn't have any sample data, that's fine, and your later approach for adding sample data with merge_phyloseq
is fine.Glad the workshop was helpful. Sorry if some of those example commands are outdated. Can you post the biggest discrepancies that you've come across at the Issue Tracker for the phyloseq demo? That will help me fix it much faster.
Thanks for the feedback, as always
joey
When I try to update phyloseq to the github version I get an error. Here are my command lines:
source("http://bioconductor.org/biocLite.R") biocLite("phyloseq") install.packages("devtools") library("devtools") install_github("phyloseq", "joey711")
install_github("phyloseq", "joey711")
Installing github repo(s) phyloseq/master from joey711 Installing phyloseq.zip from https://api.github.com/repos/joey711/phyloseq/zipball/master Installing phyloseq /Library/Frameworks/R.framework/Resources/bin/R --vanilla CMD build \ '/private/var/folders/dg/clpwjnzx45v0b0t6r1v3ytsm0000gq/T/Rtmpr4n5et/joey711-phyloseq-e34e8aa' \ --no-manual --no-resave-data
Attaching package: 'ade4'
The following object(s) are masked from 'package:base':
within
Loading required package: picante Loading required package: ape Loading required package: vegan Loading required package: permute This is vegan 2.0-5
Attaching package: 'vegan'
The following object(s) are masked from 'package:ade4':
cca
Loading required package: nlme Warning: "legend" argument in scale_XXX is deprecated. Use guide="none" for suppress the guide display. (Deprecated; last used in version 0.8.9) Warning: "legend" argument in scale_XXX is deprecated. Use guide="none" for suppress the guide display. (Deprecated; last used in version 0.8.9) Warning: "legend" argument in scale_XXX is deprecated. Use guide="none" for suppress the guide display. (Deprecated; last used in version 0.8.9) Loading required package: plyr
Attaching package: 'reshape'
The following object(s) are masked from 'package:plyr':
rename, round_any
Warning: "legend" argument in scale_XXX is deprecated. Use guide="none" for suppress the guide display. (Deprecated; last used in version 0.8.9)
Warning: Removed 1 rows containing missing values (geom_text).
Warning: Removed 1 rows containing missing values (geom_text).
Error in texi2dvi(file = file, pdf = TRUE, clean = clean, quiet = quiet, :
Running 'texi2dvi' on 'phyloseq_analysis.tex' failed.
Calls:
Looks like your system doesn't have a working version of latex, so it fails to rebuild the vignettes during installation. I'm not sure why R packages have to do that by default when building from source. The PDF files for the vignettes are already included, and it looks like your installation would otherwise work.
You have two clear options off the top of my head.
temp <- tempfile()
macURL = "http://bioconductor.org/packages/devel/bioc/bin/macosx/leopard/contrib/2.16/phyloseq_1.3.11.tgz"
download.file(macURL, temp)
install.packages(temp, repos = NULL, type = "mac.binary.leopard")
Also, this latest comment is a real and separate issue, that you should probably post anew with a different title. I think other users might benefit from seeing comments about installation issues.
Hey, I'm importing the mapping file just fine, and the tree as well. But I'm having problems with the biom file. I added the metadata using biom --add metadata, but when I imported it into R, I got the following: import_biom(otufile) phyloseq-class experiment-level object otu_table() OTU Table: [ 27566 taxa and 24 samples ] sample_data() Sample Data: [ 24 samples by 4 sample variables ] tax_table() Taxonomy Table: [ 27566 taxa by 7 taxonomic ranks ] Warning message: In strsplit(msg, "\n") : input string 1 is invalid in this locale
I tried several times, and I have no idea how to solve this. Any thoughts? Thanks
Hello,
I am trying to bring in otu_table.biom, mapping file = SMP_Run1_Map.txt, and a tree rep_set.tre
but when I run the the import_qiime command this is what I get
what am I doing wrong?
I can import the biom file with
and get this:
I can import the mapping file
and see all of my mapping data
How do I import my tre file? I tried this with no success
In my 454 run I had one control that came back with no sequences and I don't know if that is causing an issue (zeros in the biom file??) in the import_qiime function?
I had to manually add a blank line in the biom file so that it didn't have an 'incomplete' line or R would not import it
But by bringing in these files separately I realize that I am missing commands that are built into the function
I can do the import_qiime function with the sample obesity data set so I assume there is something wrong with my files that I am too much of a rookie to figure out. After a day of searching and trying to troubleshoot on my own I now pass my issue on to an expert!
Thanks - Carly