Closed alejorojas2 closed 7 years ago
Binary file? That sounds like the hdf5-based "version 2" biom format. The original biom format was not binary, a JSON based format. The original biom format is what is currently supported in phyloseq.
The biom-format team told me about the update probably 1 year ago. You are the first to post to phyloseq issue tracker about it. I've been discussing with @nosson (Joe Paulson) about adding biom-v2 support in the biom package on CRAN.
The more this becomes an issue, the faster one of us will support it. I personally have not encountered a need for HDF5-based storage of microbiome count data.
If you have some way of storing your OTU table in a biom v1 format, legacy QIIME format, or even just a tab-delimited table or any format that can be read by R, then you will be able to use phyloseq.
Hope that helps. I'll leave this issue open for now until there are further (and more helpful) comments about workarounds and/or actual biom-v2 support.
Cheers
joey
I had to convert my biom format using "convert biom" which is distributed with qiime, perhaps this is the issue? Maybe try:
biom convert -i otu_table.biom -o otu_table_json.biom --table-type="OTU table" --to-json
and see if that works?
All the best,
Phill
In QIIME 1.9 the biom files are all in the new format and are not read in correctly into phyloseq. The only work around I have going at the moment is an install of QIIME 1.8 to make the biom files.
Hi Ara,
If you'd like, can you go ahead and test this function: https://gist.github.com/jnpaulson/c9cdcb8cf95e7daae468
This will import biom vs. 2.0 files. If you find it works well I'm sure Joey will be able to roll it into the Biom package after a few unit tests are added (which shouldn't take us too long).
On Sun, Mar 15, 2015 at 9:55 PM, Ara notifications@github.com wrote:
In QIIME 1.9 the biom files are all in the new format and are not read in correctly into phyloseq. The only work around I have going at the moment is an install of QIIME 1.8 to make the biom files.
— Reply to this email directly or view it on GitHub https://github.com/joey711/phyloseq/issues/443#issuecomment-81343556.
Joey and Phil,
I did try the command that Phil suggested converting the BIOM file to json format, it worked, now I am able to import my data to phyloseq without any issues. But I looked forward to the function to read hdf5 files, this can be helpful since BIOM files are moving into that direction.
Thanks!
Alejandro
@nosson Thanks! I will give that a try.
@alejorojas2 @bioinfonm If you guys are willing to be testers for the gist I posted above I know Joey will import it into biom package. I just need to write the documentation. Is there a desire/need to "write" hdf5 biom2 files?
@nosson Sounds good. Phyloseq is our main tech for processing and understanding our microbial data. So anything I can do to help out I am game for.
I'll give it a try here around lunch!
@nosson Does the biom file need to be a rich format? I dumped out a biom file with the default settings in QIIME 1.9
Aras-MacBook-Air-2:qiime_1_9_test arakooser$ make_otu_table.py -i seqs_otus.txt -t seqs_rep_set_tax_assignments.txt -o de_novo_sumaclust.biom
and the got this error in phyloseq using the biom 2.x function
> biom_file = "de_novo_sumaclust.biom" ###This is your .biom file
> map_file = "LavaBedsCombinedMapping.txt" ###This is your mapping file with all the metadata
> y = read_hdf5_biom(biom_file)
HDF5-DIAG: Error detected in HDF5 (1.8.7) thread 0:
#000: H5F.c line 1522 in H5Fopen(): unable to open file
major: File accessability
minor: Unable to open file
#001: H5F.c line 1313 in H5F_open(): unable to read superblock
major: File accessability
minor: Read failed
#002: H5Fsuper.c line 334 in H5F_super_read(): unable to find file signature
major: File accessability
minor: Not an HDF5 file
#003: H5Fsuper.c line 155 in H5F_locate_signature(): unable to find a valid file signature
major: Low-level I/O
minor: Unable to initialize object
HDF5: unable to open file
Error in h5checktypeOrOpenLoc(file, readonly = TRUE) :
Error in h5checktypeOrOpenLoc(). File 'de_novo_sumaclust.biom' is not a valid HDF5 file.
Called from: h5read(file_input, "/", read.attributes = TRUE)
Browse[1]> Q
@bioinfonm Mhmm, just for clarification - this is not a phyloseq error. We're using the R packages biom rhdf5 to import this file. Converting it into a biom-class object (as Joey defined in the biom package) then lets us easily convert that to an object that can be parsed by phyloseq or metagenomeSeq.
Do you think you could send me a MRE (minimally reproducible example?). The ones I have from biom-format.org worked, but having a simple real example like yours would help with debugging. I don't personally have Qiime wrappers installed so I can't create my own without lots of effort. If you can get something to me my email is jpaulson@umiacs.umd.edu.
@nosson Will do! I'll send the whole file over.
Ara
@nosson I am traveling right know but as soon as I can test it, I will let you know. About the need for hdf5, for me it was just by chance since it is the first time that used qiime for my analysis and I think it is the default behavior now.
So a similar problem cropped up on the qiime forum here: https://github.com/biocore/qiime/issues/1928
Looks like something goofy with the way QIIME is handling the hdf5 format.
The solution at the moment that works is: biom convert -i de_novo_sumaclust.biom -o fixed.biom --table-type="OTU table" --to-hdf5
@nosson I did try your function and it worked for me, I had to install the library for BIOM
and rhdf5
to manage to run your function. The biom 0.3.12 and rhdf5 2.10.0 were the versions that installed on my R.
The I ran the the function and run your command:
y = read_hdf5_biom("table_even7620.biom")
biom(y)
then I get this output, that actually makes sense based on my data:
biom object.
type: OTU table
matrix_type: dense
714 rows and 36 columns
So, If this is a biom
object, the next step will be importing the file using import_biom(BIOMfilename = y)
.
update: I assign y1 <- biom(y)
to save it as biom
object, but I still get the same message:
> import_biom(BIOMfilename = y1)
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘fromJSON’ for signature ‘"biom", "missing"’
So it seems that your functions creates a list instead of a biom file, or maybe I am missing something?
Thanks!
Alejandro
@alejorojas2 Awesome! Glad to hear it works for you - it seems like some of the default outputs by Qiime are not 'true' HDF5 files (@bioinfonm's was not).
Glad to hear it worked for you! This is not an issue as Phyloseq uses biom
class objects natively in R. Essentially instead of calling import_biom
you would run a truncated version of that function:
https://github.com/joey711/phyloseq/blob/master/R/IO-methods.R#L1777-L1849
import_biom2(y1)
)
https://gist.github.com/jnpaulson/324ac1fa3eab1bc7f845import_biom <- function(BIOMfilename,
treefilename=NULL, refseqfilename=NULL, refseqFunction=readDNAStringSet, refseqArgs=NULL,
parseFunction=parse_taxonomy_default, parallel=FALSE, version=1.0, ...){
# initialize the argument-list for phyloseq. Start empty.
argumentlist <- list()
# Read the data
if(class(x)=="biom"){ # or check if string first
x = BIOMfilename
} else {
x = read_biom(biom_file=BIOMfilename)
}
write_biom(y1,"file/path/filename.biom")
then import it using import_biom("/file/path/filename.biom")
.Hi there,
I am following this thread with interest as I am also using the latest QIIME. The read_hdf5_biom function provided worked, but I was getting the same error as CarlyRae did (see comments here https://gist.github.com/nosson/324ac1fa3eab1bc7f845) for the import_biom2 function. The error suggests a list is not the expected input, as it is expecting the biom-class (subclass of List) as input, I modified the code to convert the list to a biom object... and it seems to work. But I just wanted to check with you guys if this was the Right Thing to Do :)
import_biom2 <- function(x,
treefilename=NULL, refseqfilename=NULL, refseqFunction=readDNAStringSet, refseqArgs=NULL,
parseFunction=parse_taxonomy_default, parallel=FALSE, version=1.0, ...){
# initialize the argument-list for phyloseq. Start empty.
argumentlist <- list()
x = biom(x) # added this line
EDIT: I just realised I haven't got the latest bioconductor installed. I'll update everything and get back to you if there is still a problem.
I upgraded to R 3.2 (Bioconductor version 3.1 (BiocInstaller 1.18.2)), installed the phyloseq library (‘1.13.2’), and still needed to add that line (x = biom(x)) to the import_biom2 function to get it to work. But I think there are still issues, as the taxonomy parsing isn't working correctly (I've tried default and the parse_taxonomy_qiime options too). I thought this all might work if I could install the dev version of phyloseq, but I can't seem to get this to work.
> library("devtools")
> install_github("phyloseq", "joey711")
Downloading github repo joey711/phyloseq@master
Installing phyloseq
Skipping 4 packages not available: BiocGenerics, Biostrings, DESeq2, multtest
'/usr/lib/R/bin/R' --no-site-file --no-environ --no-save --no-restore CMD \
INSTALL '/tmp/Rtmpm7oa1a/devtools2137cf4daa7/joey711-phyloseq-dd34ad0' \
--library='/home/bender/R/x86_64-pc-linux-gnu-library/3.2' --install-tests
* installing *source* package ‘phyloseq’ ...
** R
** data
** inst
** tests
** preparing package for lazy loading
Note: the specification for S3 class “AsIs” in package ‘RJSONIO’ seems equivalent to one from package ‘BiocGenerics’: not turning on duplicate class definitions for this class.
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
Note: the specification for S3 class “AsIs” in package ‘RJSONIO’ seems equivalent to one from package ‘BiocGenerics’: not turning on duplicate class definitions for this class.
* DONE (phyloseq)
Reloading installed phyloseq
Warning message:
Username parameter is deprecated. Please use joey711/phyloseq
> library(phyloseq)
> packageVersion("phyloseq")
[1] ‘1.13.2’
Hi @LBragg,
Thanks for the research so far into the problem and apologies for the delay!
If you could email me a MRE (minimally reproducible example) that you're working with, that'll really help me debug and fix up the code so that we can get biom-class 2.1 objects imported into phyloseq
and metagenomeSeq
and functions available in @joey711's biom
package.
@LBragg Thanks for the MRE - here's a link to the now updated import_biom2.R script https://gist.github.com/nosson/324ac1fa3eab1bc7f845 that should fix everything (sorry for the delay, been in Paris for the last week).
@joey711 If you're happy with this, let me know and I can make import_biom flexible and modify import_biom
+ read_biom
and do a pull request when I get stateside in the next two weeks.
Hey guys, all LGTM! Make sure to think of some unit tests for some of the special cases. I'm happy to push out any/all improvements to get this functionality out there. Feel free to ping me repeatedly if there's a pending PR I haven't noticed.
The fixes work for me, the biom produced by qiime 1.9 can be imported. As an aside, I noticed when I passed a parameter (I was trying to pass the mapping file) to import_biom2()
that wasn't expected, it gets passed to the read_tree function, causing the tree to not be read... It's a trivial thing, but thought I should mention it while you're updating the function =)
Warning message:
In import_biom2(hd5_biom, mapfilename = mapping_file, treefilename = treefilename) :
treefilename failed import. It not included.
@LBragg
The warning you showed seems like expected/desired behavior if the treefilename
argument is a missing/incorrect file. Was that the case? or were you expecting that tree-file to work?
Btw everyone, @nosson et al, I was imagining we add a new argument to import_biom
regarding the biom format version (it might already have one, just currently ignored), and/or we autodetect it. Seems straightforward enough we shouldn't need a completely different function, and might help avoid mistakes (especially if we can do the autodetect).
Thoughts?
joey
Sounds good to me. As for that issue I was describing, the treefilename
argument is the correct tree file, however the tree fails to load if I supply additional, unexpected arguments to the import_biom2()
function. I guess they are passed using the ...
argument list. Maybe it's more that the read_tree
function silently returns NULL
if any errors occur, such as the passing of incorrect arguments. No biggie.
Boom! I just made a large number of changes to the BIOM
package that should be ready for pulling in and will seamlessly fix the import_biom
issue.
Check out: https://github.com/joey711/biom/pull/11
Hi all,
Sorry to be picking up on an earlier part of this thread so late in the game but I am just now running into my issue.
I am trying to import my biom file, converted from hdf5 to json using Phil's earlier suggestion with the biom convert in QIIME, and receiving the following error:
jsonbiom = system.file("extdata", "otu_table_json.biom", package="phyloseq") treefile = system.file("extdata","rep_set.tre", package="phyloseq") refseq = system.file("extdata","rep_set.fna",package="phyloseq") import_biom(jsonbiom,treefile,refseq, parseFunction = parse_taxonomy_qiime) Error in fromJSON(content, handler, default.size, depth, allowComments, : invalid JSON input
I've tried using a txt file as well, and it just wont seem to work.
Any help is appreciated, Dylan
that's not a valid file path. the system.file
commands are shown in tutorials because they are OS-independent, and the data is within the package itself.
Your file path for your data should just be a standard relative or absolute Unix path string.
jsonbiomfile = "path/to/my/data/file.biom"
import_biom(jsonbiomfile)
By the way, the early version of the replacement that will solve this HDF5 / Biom-V2 issue is now a separate repository. Plan is to post it on Bioconductor:
https://github.com/joey711/biomformat
The release version of phyloseq package cannot officially support it as an internal dependency until it is released on BioC. The devel version can do so, as soon as the first version of "biomformat" is on BioC.
I will close this issue once the solutions are implemented enough in phyloseq and its doc.
Hi Joey,
My apologies for not catching this was a tutorial specific command. Your suggestion worked wonderfully. Thanks for the HDF5 info as well.
D
Hi all,
about importing a biome file in R: when read_biom
reads a biom/HDF5, abundance data are stored in R as numeric
s instead of integer
s, that seems a huge waste of memory IMHO.
With American Gut Project's I went from 9.2GB to 480MB just by casting to integers.
@mafed interesting - can you perhaps post this in the biomformat package: https://github.com/joey711/biomformat/issues and i'll go ahead and fix this
@jnpaulson sure, I'll do it immediately, thanks
The documentation for the import_biom
function has the following description:
New versions of QIIME produce a more-comprehensive and formally-defined JSON file format, called biom file format
This is not correct, is it? Shouldn't it say "Old versions" instead of "New versions"?
Has the hdf5 biom issue been fixed? I still get the error "Error in nchar(content) : invalid multibyte string 1" when trying to import a biom generated with QIIME 1.9.1. The documentation here seems to imply that after April this problem was fixed. I've checked that my version of phyloseq is up to date.
This issue should be fixed for some time now. The dependency in phyloseq was migrated in previous release version to point to "biomformat" on Bioconductor, which supports both JSON and HDF5 versions of the format, rather than "biom" on CRAN, which should be considered deprecated. All functionality from "biom" package on CRAN is now subsumed within "biomformat" on Bioconductor.
I don't know what QIIME did or did not do to fix the issue, but AFAIK biomformat is reading these formats just fine.
I will close for now, but feel free @jmicrobe to post more details of your problem if it turns out it is not solved by biomformat/phyloseq.
@jnpaulson feel free to re-open if you think there's something still TBD.
Ok, I solved my own issue. But putting comments here in case someone does this too!
Reproducible result:
data("GlobalPatterns")
MGS <- phyloseq_to_metagenomeSeq(GlobalPatterns)
MGS
p <- cumNormStatFast(MGS) MGS <- cumNorm(MGS, p =p)
b <- MRexperiment2biom(MGS, norm = T)
write_biom(b, biom_file = "test.biom")
biom1 <- import_biom2("test.biom")
biom2 <-read_biom("test.biom")
biom3 <- import_biom2(biom2)
Hi everyone!
I have my otu_table.biom constructed by Qiime 1.9 and my metadata file and I tried to import in Phyloseq and until now I have not couldn't; I changed my otu_table.biom to json with this command
biom convert -i otu_table.biom -o otu_table_json.biom --table-type="OTU table" --to-json
However, with this file, I have not couldn't, please if you have any advice, I'm new in bioinformatics :)
Thanks in advance
Sorry I forgot to wrote script that I used, it was:
jsonbiomfile = "C/Users/metagenomic/Desktop/otu_table_json.biom" jsonbiomfile = "C/Users/metagenomic/Desktop/metadata_ITS.txt"
biom_file <- paste(jsonbiomfile, "otu_table_json.biom", sep = "") map_file <- paste(jsonbiomfile, "metadata_ITS.txt", sep = "")
read_biom(biom_file)
Is there something wrong??
This is what works for me from QIIME 1.9 .json files
biom <- import_biom("milk_final.biom", parseFunction = parse_taxonomy_greengenes) warnings()
rank_names(biom)
map <- import_qiime_sample_data("map_milk_final_alpha.txt")
tree = read.tree("milk_final_rooted.tre")
GM <- merge_phyloseq(biom, tree, map)
GM_data = filter_taxa(GM, function(x) sum(x) != 0, TRUE)
sum(taxa_sums(GM_data) == 0)
ntaxa(GM_data) nsamples(GM_data)
sort(sample_sums(GM_data))
max(sample_sums(GM_data))/min(sample_sums(GM_data))
Thank you Carly, for your answer!!!
I tested this
jsonbiomfile = "C:/Users/metagenomic/Desktop/otu_table_json.biom" read_biom(jsonbiomfile) x = read_biom(jsonbiomfile) x2 <- import_biom(x) head(otu_table(x2)) head(tax_table(x2))
metadata = "C:/Users/metagenomic/Desktop/metadata_ITS.txt" read.csv(metadata) t <- import_qiime_sample_data(metadata)
x2_merged <- merge_phyloseq(x2, t) colnames(tax_table(x2_merged)) <- c("Domain", "Phylum", "Class", "Order", "Family", "Genus", "Species") sample_data(x2_merged)
And it worked!!!
Thnaks
Hi Joey711,
I have been trying to import a biom file that I generated with qiime, the file is binary file, it seems that is new format for this kind of files, then when trying to work with it in R I get this error:
This is my code so far:
COI_data <- import_biom(biom_file) Error in nchar(content) : invalid multibyte string 1
COI_data <- import_qiime(BIOMfilename = biom_file) Error in arglist[sapply(arglist, is.component.class)] : invalid subscript type 'list'
Maybe I am missing something or should I use an older version of BIOM file?