joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
581 stars 187 forks source link

custom import of data tables #256

Closed kfontanez closed 10 years ago

kfontanez commented 10 years ago

I tried creating a biom file using the python package but ended up manually creating the file. I started from a simple taxa count table where the columns are samples and the rows are taxonomic identifications (genus level). The values represents counts of those genera in each sample. Unfortunately, the biom format python package failed to create properly formatted biom file from that basic input.

So, I manually created the biom file in text wrangler (unix line breaks) but keep encountering an error when trying to import into phyloseq. I know the format is correct because when I check it with the R biom package it recognizes it as a sparse OTU table, biom object.

read_biom("BacteriaFunc.biom")
biom object. 
type: OTU table 
matrix_type: sparse 
1264 rows and 7 columns 

However, when I open phyloseq and try to import the file, I get the following error:

import_biom("BacteriaFunc.biom")
Error in i$metadata$taxonomy : $ operator is invalid for atomic vectors

This seems to me some type of bug in the code rather than a problem with the file. Has anyone seen this type of error before?

My metadata, taxonomy section looks like this:

"rows": [
    {
        "id": "otu1",
         "metadata": {
         "taxonomy": "g_Alteromonas"
         }
    },
    {

        "id": "otu2", 
        "metadata": {
            "taxonomy": "g_Vibrio"
        }
    },

thanks for your help! Kristina

joey711 commented 10 years ago

Hi Kristina,

Can you provide me a copy of the file that causes this error (but can load with the biom package)? The email address listed in the phyloseq package documentation will work fine.

Also, what version of phyloseq are you using?

joey

kfontanez commented 10 years ago

Great, thanks! I sent you the file.

defleury commented 10 years ago

I currently have a very similar problem to Kristina's, so I figured I'd add it here.

I created a BIOM file using a Perl script, and can successfully read / manipulate it using QIIME code. Also, previous versions of the BIOM file used to load alright in phyloseq. However, after I added sample metadata to the file, phyloseq no longer loads it. This is the error that I get:

Error in validObject(.Object) : invalid class “phyloseq” object: 
 Component sample names do not match.
 Try sample_names()

The sample metadata part of my BIOM file looks like this:

"columns": [
{"id": "1", "metadata":{"Animal":"1",
"Treatment":"XXX",
"Timepoint":"0",
"CombinedGroup":"1.0.XXX"
}},
{"id": "2", "metadata":{"Animal":"2",
"Treatment":"YYY",
"Timepoint":"0",
"CombinedGroup":"2.0.YYY"
}},
…
{"id": "80", "metadata":{"Animal":"32",
"Treatment":"YYY",
"Timepoint":"2",
"CombinedGroup":"32.2.YYY"
}}
]

Thanks in advance for any help / input you can provide :-)

*Fleury

joey711 commented 10 years ago

Something for you both to try using biom-package:

The point is that the sample_names in the data.frame that results from sample_metatdata() need to match the sample (column) names in the observation matrix. Having non-matching indices is something that is not allowed for phyloseq, but might be tolerated to some extent in biom format generally (though I could be wrong). For the time-being I think it is true that the biom-format is a little less restrictive. In any case, see if you can successfully get those tables out of your biom files without loading phyloseq at all. If so, then you can check the indices, and build the phyloseq-objects by their components with a single call to the phyloseq() function. Please report back what you found. I want to make sure there is not actually a bug in phyloseq. I doubt it is a coincidence that you both created these biom-format files through non-standard means. I don't know exactly what is wrong yet, but the inspection I've exemplified below will help us all figure it out.

library("biom")
rich_sparse_file = system.file("extdata", "rich_sparse_char.biom", package = "biom")
rich_sparse_file
## [1] "/Library/Frameworks/R.framework/Versions/3.0/Resources/library/biom/extdata/rich_sparse_char.biom"
biom = read_biom(rich_sparse_file)
biom_shape(biom)
## nrow ncol 
##    5    6
observation_metadata(biom)
##            taxonomy1         taxonomy2              taxonomy3
## GG_OTU_1 k__Bacteria p__Proteobacteria c__Gammaproteobacteria
## GG_OTU_2 k__Bacteria  p__Cyanobacteria    c__Nostocophycideae
## GG_OTU_3  k__Archaea  p__Euryarchaeota     c__Methanomicrobia
## GG_OTU_4 k__Bacteria     p__Firmicutes          c__Clostridia
## GG_OTU_5 k__Bacteria p__Proteobacteria c__Gammaproteobacteria
##                     taxonomy4             taxonomy5         taxonomy6
## GG_OTU_1 o__Enterobacteriales f__Enterobacteriaceae    g__Escherichia
## GG_OTU_2        o__Nostocales        f__Nostocaceae g__Dolichospermum
## GG_OTU_3 o__Methanosarcinales f__Methanosarcinaceae g__Methanosarcina
## GG_OTU_4   o__Halanaerobiales   f__Halanaerobiaceae  g__Halanaerobium
## GG_OTU_5 o__Enterobacteriales f__Enterobacteriaceae    g__Escherichia
##                                taxonomy7
## GG_OTU_1                             s__
## GG_OTU_2                             s__
## GG_OTU_3                             s__
## GG_OTU_4 s__Halanaerobiumsaccharolyticum
## GG_OTU_5                             s__
sample_metadata(biom)
##         BarcodeSequence  LinkerPrimerSequence BODY_SITE Description
## Sample1    CGCTTATCGAGA CATGCTGCCTCCCGTAGGAGT       gut   human gut
## Sample2    CATACCAGTAGC CATGCTGCCTCCCGTAGGAGT       gut   human gut
## Sample3    CTCTCTACCTGT CATGCTGCCTCCCGTAGGAGT       gut   human gut
## Sample4    CTCTCGGCCTGT CATGCTGCCTCCCGTAGGAGT      skin  human skin
## Sample5    CTCTCTACCAAT CATGCTGCCTCCCGTAGGAGT      skin  human skin
## Sample6    CTAACTACCAAT CATGCTGCCTCCCGTAGGAGT      skin  human skin
defleury commented 10 years ago

So, I've tried different things in the meantime on my BIOM table.

(i) I tried loading it using the biom package in R -> constantly fails with the same error message:

Error in validObject(.Object) : 
  invalid class "biom" object: type field has unsupported value

…which is rather cryptic to me.

(ii) I modified several things about my BIOM file, mostly regarding the sample metadata entries, and tried re-loading it using the biom R package. Always fails, I have no clue really what else I could modify. I tried:

-> playing around spaces/non-spaces and quotes vs non-quotes in the metadata lines, like so:

"Treatment":"XXX",

vs

"Timepoint": 0,

…but to no avail

-> entering all the metadata fields that QIIME requests by default ("BarcodeSequence" etc). Didn't change anything.

-> changing the "format" field to "1.0.0", because "0.9*" seemed to annoy some QIIME scripts; but again, that didn't do anything.

Even though I used a custom Perl script to generate the BIOM file, I think that the format is OK as QIIME prints library stats without complaints and rightly reports sample metadata fields:

Sample Metadata Categories: Timepoint; Treatment; Animal; CombinedGroup

For the time being, I think I will stick to good-old scripting to analyze my data. I thought I'd give BIOM and phyloseq a go, but I realize that this generates more hassle for my data processing than it makes my life easier. I think you're doing a great thing with the phyloseq package, but for the level of flexibility I require (which includes non-QIIME preprocessing and thus custom scripts to make a BIOM file…), it is currently not yet right for me.

Keep up the good work! Best,

*Fleury

joey711 commented 10 years ago

Fleury,

Before you give up on the biom-format, and then on phyloseq as a collateral, did you try using the main biom-format tools from the biom-format project, which are written in python (and biom-format is a JSON format, so you might check if there are some syntactical errors there)? They might be able to give you a more clear diagnostic about what is wrong with your biom file. The fact that QIIME can read library stats from your file doesn't tell me very much, and doesn't validate the file format.

Finally, and this is really important, phyloseq is not wedded to QIIME or the biom-format. By design. The notion of "I want to be flexible about my sequence processing and downstream analysis" is exactly a reason to use phyloseq. Not the other way around. If you're having trouble importing a biom format, you can just import your data into R as tables. I spent a fair amount of time creating the data infrastructure necessary to allow a user to relate their data tables as a phyloseq-object, and the relevant functions to look at are phyloseq and merge_phyloseq. Although phyloseq is probably what you want. If you're into "scripting", you should find R really useful, and importing data tables into R really easy. The only non-table you might want right away is a phylogenetic tree, which isn't yet supported by biom-format, anyway. The read_tree function in phyloseq will import that for you.

Hope that helps. Best of luck

joey

kfontanez commented 10 years ago

Joey-

So, I followed your directions and was able to recreate your output. However, I'm not sure how that helps to solve my issue of importing my OTU table into phyloseq. In your last e-mail to Fleury you mentioned that it is possible to import an OTU table directly into the phyloseq package. Can you post explicit directions for how that is done? If I can avoid the biom format entirely, all the better. The OTU table originally didn't have the otu column, which I added when trying to convert to biom format. It used to look like taxonomy/150L/150D/ etc.

My starting OTU table is of the format: otu 150L 150D 200D 300L 300D 500L 500D taxonomy otu1 468035 1185 330 237111 94 232341 194 Alteromonas otu2 54696 465193 56075 13513 24703 6713 1446 Vibrio otu3 327010 2522 1288 99880 1193 10119 934 Pseudoalteromonas otu4 276939 615 783 145627 106 1829 486 Marinobacter ...

And my metadata file looks like: 150L 150 live 150D 150 dead 200D 200 dead 300L 300 live 300D 300 dead 500L 500 live 500D 500 dead

I have attached them both to this email for your consideration.

Thank you, Kristina

On Oct 30, 2013, at 6:20 PM, Paul J. McMurdie wrote:

Something for you both to try using biom-package:

library("biom") rich_sparse_file = system.file("extdata", "rich_sparse_char.biom", package = "biom") rich_sparse_file

[1] "/Library/Frameworks/R.framework/Versions/3.0/Resources/library/biom/extdata/rich_sparse_char.biom"

biom = read_biom(rich_sparse_file) biom_shape(biom)

nrow ncol

5 6

observation_metadata(biom)

taxonomy1 taxonomy2 taxonomy3

GG_OTU_1 kBacteria pProteobacteria c__Gammaproteobacteria

GG_OTU_2 kBacteria pCyanobacteria c__Nostocophycideae

GG_OTU_3 kArchaea pEuryarchaeota c__Methanomicrobia

GG_OTU_4 kBacteria pFirmicutes c__Clostridia

GG_OTU_5 kBacteria pProteobacteria c__Gammaproteobacteria

taxonomy4 taxonomy5 taxonomy6

GG_OTU_1 oEnterobacteriales fEnterobacteriaceae g__Escherichia

GG_OTU_2 oNostocales fNostocaceae g__Dolichospermum

GG_OTU_3 oMethanosarcinales fMethanosarcinaceae g__Methanosarcina

GG_OTU_4 oHalanaerobiales fHalanaerobiaceae g__Halanaerobium

GG_OTU_5 oEnterobacteriales fEnterobacteriaceae g__Escherichia

taxonomy7

GG_OTU_1 s__

GG_OTU_2 s__

GG_OTU_3 s__

GG_OTU_4 s__Halanaerobiumsaccharolyticum

GG_OTU_5 s__

sample_metadata(biom)

BarcodeSequence LinkerPrimerSequence BODY_SITE Description

Sample1 CGCTTATCGAGA CATGCTGCCTCCCGTAGGAGT gut human gut

Sample2 CATACCAGTAGC CATGCTGCCTCCCGTAGGAGT gut human gut

Sample3 CTCTCTACCTGT CATGCTGCCTCCCGTAGGAGT gut human gut

Sample4 CTCTCGGCCTGT CATGCTGCCTCCCGTAGGAGT skin human skin

Sample5 CTCTCTACCAAT CATGCTGCCTCCCGTAGGAGT skin human skin

Sample6 CTAACTACCAAT CATGCTGCCTCCCGTAGGAGT skin human skin

— Reply to this email directly or view it on GitHub. 150L 150 live 150D 150 dead 200D 200 dead 300L 300 live 300D 300 dead 500L 500 live 500D 500 dead otu 150L 150D 200D 300L 300D 500L 500D taxonomy otu1 468035 1185 330 237111 94 232341 194 Alteromonas otu2 54696 465193 56075 13513 24703 6713 1446 Vibrio otu3 327010 2522 1288 99880 1193 10119 934 Pseudoalteromonas otu4 276939 615 783 145627 106 1829 486 Marinobacter otu5 64967 222 170 196424 47 477 218 Methylophaga otu6 65448 300 181 88196 35 15495 203 Alcanivorax otu7 2689 1333 2354 21578 553 51619 5377 Candidatus Pelagibacter otu8 34502 1261 344 22313 151 17892 404 Glaciecola ...

joey711 commented 10 years ago

phyloseq() example

How to use manually-imported tables in R and combine them together in a phyloseq object. We'll create the example vanilla R tables using base R code. No packages required yet.

# pretend OTU table that you read from a file, called otumat
otumat = matrix(sample(1:100, 100, replace = TRUE), nrow = 10, ncol = 10)
otumat
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
##  [1,]    4   41   60   84   33   31   70   90   71    30
##  [2,]   69   59   14   75   92   69   26   30   90    54
##  [3,]   63   66    3   23   17   89   84   95   35    81
##  [4,]    4   25   69   88   30   35   14   36   72    18
##  [5,]   53   35    3   20   18   53   56   60   84     1
##  [6,]   35   97   15   41   44   26   55   55   20     6
##  [7,]   35   80   10   33   95   60   17   27   13     2
##  [8,]   83   70   89   21   42   49   59   45   35    33
##  [9,]   47   55   91   59   16   54   33   61   47    32
## [10,]    2   51   24   19   59   69   24   88   76    98
# It needs sample names and OTU names, the index names of the matrix Your
# table might already have this
rownames(otumat) <- paste0("OTU", 1:nrow(otumat))
colnames(otumat) <- paste0("Sample", 1:ncol(otumat))
otumat
##       Sample1 Sample2 Sample3 Sample4 Sample5 Sample6 Sample7 Sample8
## OTU1        4      41      60      84      33      31      70      90
## OTU2       69      59      14      75      92      69      26      30
## OTU3       63      66       3      23      17      89      84      95
## OTU4        4      25      69      88      30      35      14      36
## OTU5       53      35       3      20      18      53      56      60
## OTU6       35      97      15      41      44      26      55      55
## OTU7       35      80      10      33      95      60      17      27
## OTU8       83      70      89      21      42      49      59      45
## OTU9       47      55      91      59      16      54      33      61
## OTU10       2      51      24      19      59      69      24      88
##       Sample9 Sample10
## OTU1       71       30
## OTU2       90       54
## OTU3       35       81
## OTU4       72       18
## OTU5       84        1
## OTU6       20        6
## OTU7       13        2
## OTU8       35       33
## OTU9       47       32
## OTU10      76       98
# Now we need a pretend taxonomy table
taxmat = matrix(sample(letters, 70, replace = TRUE), nrow = nrow(otumat), ncol = 7)
rownames(taxmat) <- rownames(otumat)
colnames(taxmat) <- c("Domain", "Phylum", "Class", "Order", "Family", "Genus", 
    "Species")
taxmat
##       Domain Phylum Class Order Family Genus Species
## OTU1  "k"    "m"    "c"   "f"   "h"    "k"   "x"    
## OTU2  "r"    "t"    "f"   "z"   "u"    "k"   "e"    
## OTU3  "g"    "e"    "o"   "l"   "f"    "s"   "x"    
## OTU4  "k"    "c"    "d"   "j"   "y"    "y"   "c"    
## OTU5  "q"    "c"    "p"   "p"   "s"    "w"   "h"    
## OTU6  "i"    "r"    "v"   "t"   "z"    "x"   "n"    
## OTU7  "i"    "u"    "h"   "n"   "a"    "x"   "a"    
## OTU8  "r"    "a"    "c"   "i"   "h"    "z"   "w"    
## OTU9  "a"    "e"    "q"   "o"   "f"    "q"   "b"    
## OTU10 "u"    "w"    "o"   "e"   "y"    "m"   "e"
class(otumat)
## [1] "matrix"
class(taxmat)
## [1] "matrix"

Note how these are just vanilla R matrices. Now let's tell phyloseq how to combine them into a phyloseq object.

library("phyloseq")
OTU = otu_table(otumat, taxa_are_rows = TRUE)
TAX = tax_table(taxmat)
OTU
## OTU Table:          [10 taxa and 10 samples]
##                      taxa are rows
##       Sample1 Sample2 Sample3 Sample4 Sample5 Sample6 Sample7 Sample8
## OTU1        4      41      60      84      33      31      70      90
## OTU2       69      59      14      75      92      69      26      30
## OTU3       63      66       3      23      17      89      84      95
## OTU4        4      25      69      88      30      35      14      36
## OTU5       53      35       3      20      18      53      56      60
## OTU6       35      97      15      41      44      26      55      55
## OTU7       35      80      10      33      95      60      17      27
## OTU8       83      70      89      21      42      49      59      45
## OTU9       47      55      91      59      16      54      33      61
## OTU10       2      51      24      19      59      69      24      88
##       Sample9 Sample10
## OTU1       71       30
## OTU2       90       54
## OTU3       35       81
## OTU4       72       18
## OTU5       84        1
## OTU6       20        6
## OTU7       13        2
## OTU8       35       33
## OTU9       47       32
## OTU10      76       98
TAX
## Taxonomy Table:     [10 taxa by 7 taxonomic ranks]:
##       Domain Phylum Class Order Family Genus Species
## OTU1  "k"    "m"    "c"   "f"   "h"    "k"   "x"    
## OTU2  "r"    "t"    "f"   "z"   "u"    "k"   "e"    
## OTU3  "g"    "e"    "o"   "l"   "f"    "s"   "x"    
## OTU4  "k"    "c"    "d"   "j"   "y"    "y"   "c"    
## OTU5  "q"    "c"    "p"   "p"   "s"    "w"   "h"    
## OTU6  "i"    "r"    "v"   "t"   "z"    "x"   "n"    
## OTU7  "i"    "u"    "h"   "n"   "a"    "x"   "a"    
## OTU8  "r"    "a"    "c"   "i"   "h"    "z"   "w"    
## OTU9  "a"    "e"    "q"   "o"   "f"    "q"   "b"    
## OTU10 "u"    "w"    "o"   "e"   "y"    "m"   "e"
physeq = phyloseq(OTU, TAX)
physeq
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 10 taxa and 10 samples ]
## tax_table()   Taxonomy Table:    [ 10 taxa by 7 taxonomic ranks ]
plot_bar(physeq, fill = "Family")

phyloseq-combine

kfontanez commented 10 years ago

Joey-

I was able to follow your directions to create the phyloseq object, thank you! I have been plotting ordinations using the plot_ordination function and I noticed that the text labels produced are really tiny. I tried changing them using theme_update to ggplot2 but was unable to find the correct element to change. How does one do this?

Functions to make plot and attached plot example with tiny labels below. I’d like to make these bigger. If I use geom_point(size=3) then the size of the circles completely overlaps the text label (which is too tiny to read anyway).

plot_ordination(Bacteria,AllBacteriacca,"samples",color="TREATMENT",label="DEPTH”)

Thanks, Kristina

On Oct 31, 2013, at 4:36 PM, Paul J. McMurdie notifications@github.com wrote:

phyloseq() example

How to use manually-imported tables in R and combine them together in a phyloseq object. We'll create the example vanilla R tables using base R code. No packages required yet.

pretend OTU table that you read from a file, called otumat

otumat = matrix(sample(1:100, 100, replace = TRUE), nrow = 10, ncol = 10) otumat

[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]

[1,] 4 41 60 84 33 31 70 90 71 30

[2,] 69 59 14 75 92 69 26 30 90 54

[3,] 63 66 3 23 17 89 84 95 35 81

[4,] 4 25 69 88 30 35 14 36 72 18

[5,] 53 35 3 20 18 53 56 60 84 1

[6,] 35 97 15 41 44 26 55 55 20 6

[7,] 35 80 10 33 95 60 17 27 13 2

[8,] 83 70 89 21 42 49 59 45 35 33

[9,] 47 55 91 59 16 54 33 61 47 32

[10,] 2 51 24 19 59 69 24 88 76 98

It needs sample names and OTU names, the index names of the matrix Your

table might already have this

rownames(otumat) <- paste0("OTU", 1:nrow(otumat)) colnames(otumat) <- paste0("Sample", 1:ncol(otumat)) otumat

Sample1 Sample2 Sample3 Sample4 Sample5 Sample6 Sample7 Sample8

OTU1 4 41 60 84 33 31 70 90

OTU2 69 59 14 75 92 69 26 30

OTU3 63 66 3 23 17 89 84 95

OTU4 4 25 69 88 30 35 14 36

OTU5 53 35 3 20 18 53 56 60

OTU6 35 97 15 41 44 26 55 55

OTU7 35 80 10 33 95 60 17 27

OTU8 83 70 89 21 42 49 59 45

OTU9 47 55 91 59 16 54 33 61

OTU10 2 51 24 19 59 69 24 88

Sample9 Sample10

OTU1 71 30

OTU2 90 54

OTU3 35 81

OTU4 72 18

OTU5 84 1

OTU6 20 6

OTU7 13 2

OTU8 35 33

OTU9 47 32

OTU10 76 98

Now we need a pretend taxonomy table

taxmat = matrix(sample(letters, 70, replace = TRUE), nrow = nrow(otumat), ncol = 7) rownames(taxmat) <- rownames(otumat) colnames(taxmat) <- c("Domain", "Phylum", "Class", "Order", "Family", "Genus", "Species") taxmat

Domain Phylum Class Order Family Genus Species

OTU1 "k" "m" "c" "f" "h" "k" "x"

OTU2 "r" "t" "f" "z" "u" "k" "e"

OTU3 "g" "e" "o" "l" "f" "s" "x"

OTU4 "k" "c" "d" "j" "y" "y" "c"

OTU5 "q" "c" "p" "p" "s" "w" "h"

OTU6 "i" "r" "v" "t" "z" "x" "n"

OTU7 "i" "u" "h" "n" "a" "x" "a"

OTU8 "r" "a" "c" "i" "h" "z" "w"

OTU9 "a" "e" "q" "o" "f" "q" "b"

OTU10 "u" "w" "o" "e" "y" "m" "e"

class(otumat)

[1] "matrix"

class(taxmat)

[1] "matrix"

Note how these are just vanilla R matrices. Now let's tell phyloseq how to combine them into a phyloseq object.

library("phyloseq") OTU = otu_table(otumat, taxa_are_rows = TRUE) TAX = tax_table(taxmat) OTU

OTU Table: [10 taxa and 10 samples]

taxa are rows

Sample1 Sample2 Sample3 Sample4 Sample5 Sample6 Sample7 Sample8

OTU1 4 41 60 84 33 31 70 90

OTU2 69 59 14 75 92 69 26 30

OTU3 63 66 3 23 17 89 84 95

OTU4 4 25 69 88 30 35 14 36

OTU5 53 35 3 20 18 53 56 60

OTU6 35 97 15 41 44 26 55 55

OTU7 35 80 10 33 95 60 17 27

OTU8 83 70 89 21 42 49 59 45

OTU9 47 55 91 59 16 54 33 61

OTU10 2 51 24 19 59 69 24 88

Sample9 Sample10

OTU1 71 30

OTU2 90 54

OTU3 35 81

OTU4 72 18

OTU5 84 1

OTU6 20 6

OTU7 13 2

OTU8 35 33

OTU9 47 32

OTU10 76 98

TAX

Taxonomy Table: [10 taxa by 7 taxonomic ranks]:

Domain Phylum Class Order Family Genus Species

OTU1 "k" "m" "c" "f" "h" "k" "x"

OTU2 "r" "t" "f" "z" "u" "k" "e"

OTU3 "g" "e" "o" "l" "f" "s" "x"

OTU4 "k" "c" "d" "j" "y" "y" "c"

OTU5 "q" "c" "p" "p" "s" "w" "h"

OTU6 "i" "r" "v" "t" "z" "x" "n"

OTU7 "i" "u" "h" "n" "a" "x" "a"

OTU8 "r" "a" "c" "i" "h" "z" "w"

OTU9 "a" "e" "q" "o" "f" "q" "b"

OTU10 "u" "w" "o" "e" "y" "m" "e"

physeq = phyloseq(OTU, TAX) physeq

phyloseq-class experiment-level object

otu_table() OTU Table: [ 10 taxa and 10 samples ]

tax_table() Taxonomy Table: [ 10 taxa by 7 taxonomic ranks ]

plot_bar(physeq, fill = "Family")

— Reply to this email directly or view it on GitHub.

defleury commented 10 years ago

Joey,

sorry for replying with delay; I've had an offline weekend.

Thanks a lot for the step-by-step tutorial on creating a phyloseq object from vanilla R. I've reformatted my data accordingly and after some tinkering I've managed to load everything, and it seems to work fine. For me, this way of handling the data was much more efficient, as soon as I got a grasp of some phyloseq subtleties.

Also, I hope that you didn't get the impression that I was proposing to "give up on" phyloseq for good in my earlier post. I honestly appreciate the work you do here, and even more so the effort you spend on documenting and on answering issues such as this. It's just that I have working R scripts in place already, but wanted to give phyloseq a go as it seemed nice and convenient for several functions that are a pain to implement (again, thanks for the good work!), but felt that the biom import issues I encountered presented a significant obstacle. Anyway, the workaround you pointed out works perfectly for me :-)

Thanks again for the patient answers and nice tutorial above. And sorry @Kristina for entering your thread; but as I see, you've successfully overcome your problems, too!

Best,

Fleury

joey711 commented 10 years ago

@kfontanez

I'm really glad that my suggestions solved your problem. That's great! ... But now you're hijacking you're own issue post. Don't be shy about posting a new, unrelated issue. There are some interesting ways of dealing with text sizes with ggplot2 objects, and some that are specific to the output from one or another plot output from plot_ordination. Feel free to mostly copy-and-paste what you've started above, but maybe use an example dataset in phyloseq for a reproducible example of your issue. This will make it a lot faster and easier for me to help.

@defleury

No worries! Yes, I did have the impression you were giving up on phyloseq for ever and ever, which I naturally felt was a bit hasty. However, I completely understand your frustration when fighting file format issues, in particular because I've had to wage many of these fights myself to create the supporting wrappers in phyloseq. I have not done a good job emphasizing the details that I showed above, and so the fact that this wasn't an obvious option to try is my fault. Please accept my apology.

On the one hand, I wanted to make it clear to R-newbie users that they could try some of the phyloseq examples on their own data without knowing much about R. On the other hand, this did a disservice to the very real support for doing things manually/interactively prior to handing the pieces to phyloseq to wrap them up in one consistent object and use further phyloseq goodies.

I'll leave this post open until I've created and linked a tutorial just for demonstrating this a little better. There are some hints in the general phyloseq demo, but when I reviewed it to see if I could point to that link, I realized it was quite inadequate.

Thank you both for your interest and feedback, without which phyloseq is not likely to do as well.

joey711 commented 10 years ago

I added a new section on the data import tutorial

http://joey711.github.io/phyloseq/import-data.html#manual

It has a more detailed explanation, and includes randomly simulated tree and sample covariate data.

I think this settles the missing documentation for now. I'll probably migrate this as a vignette to include within the package itself as well.

Thanks again for all the useful feedback.

joey