joey711 / biom

Development version of the biom package for R
24 stars 11 forks source link

Reading megan-made biom file #15

Closed bjpeleka closed 7 years ago

bjpeleka commented 9 years ago

Hi Joey I have a metagenomic project in which sequences are stored in fastq files from Illumina MiSeq platform. From fastq files I created .m8 files (using DIAMOND) and imported them into MEGAN for comparison from which I created a biom file. I tried to read the biom fileinto R using the biom package: biom2 = read_biom("/home/Documents/Misc/Untitled-cmp-Taxonomy.biom") but I got this eror message: Error in validObject(.Object) : invalid class “biom” object: Not all required top-level keys are present in biom-object. Required keys are: id format format_url type generated_by date rows columns matrix_type matrix_element_type shape data

Any help?

joey711 commented 9 years ago

Can you post an example file, or link to the formal definition of this version of BIOM?

Would love to support it in this package.

bjpeleka commented 9 years ago

I can't find the formal definition of the version of BIOM I used but the software that produced it is found here: http://ab.inf.uni-tuebingen.de/software/megan5/

Below is the actual file:

{"comment":"Taxonomy classification computed by MEGAN","id":"D:\pjbData\postdocs\benDiamond\Untitled-cmp-Taxonomy.biom","format":"Biological Observation Matrix 0.9.1-dev","url":"http://biom-format.org/documentation/format_versions/biom-1.0.html","type":"Taxon table","generated_by":"MEGAN (version 5.10.3, built 23 Apr 2015)","date":"Thu May 14 15:48:40 NZST 2015","rows":[{"id":"1","metadata":{"Taxonomy":["Root"]}},{"id":"131567","metadata":{"Taxonomy":["Root","cellular organisms"]}},{"id":"2","metadata":{"Taxonomy":["Root","cellular organisms","Bacteria"]}},{"id":"201174","metadata":{"Taxonomy":["Root","cellular organisms","Bacteria","Actinobacteria \u003cphylum\u003e"]}},{"id":"200783","metadata":{"Taxonomy":["Root","cellular organisms","Bacteria","Aquificae \u003cphylum\u003e"]}},{"id":"67819","metadata":{"Taxonomy":["Root","cellular organisms","Bacteria","Armatimonadetes"]}},{"id":"68336","metadata":{"Taxonomy":["Root","cellular organisms","Bacteria","Bacteroidetes/Chlorobi group"]}},{"id":"976","metadata":{"Taxonomy":["Root","cellular organisms","Bacteria","Bacteroidetes/Chlorobi group","Bacteroidetes"]}},{"id":"1090","metadata":{"Taxonomy":["Root","cellular organisms","Bacteria","Bacteroidetes/Chlorobi group","Chlorobi"]}},{"id":"1134404","metadata":{"Taxonomy":["Root","cellular organisms","Bacteria","Bacteroidetes/Chlorobi group","Ignavibacteriae"]}},{"id":"51290","metadata":{"Taxonomy":["Root","cellular organisms","Bacteria","Chlamydiae/Verrucomicrobia group"]}},{"id":"204428","metadata":{"Taxonomy":["Root","cellular organisms","Bacteria","Chlamydiae/Verrucomicrobia group","Chlamydiae"]}},{"id":"256845","metadata":{"Taxonomy":["Root","cellular organisms","Bacteria","Chlamydiae/Verrucomicrobia group","Lentisphaerae"]}},{"id":"74201","metadata":{"Taxonomy":["Root","cellular organisms","Bacteria","Chlamydiae/Verrucomicrobia group","Verrucomicrobia"]}},{"id":"200795","metadata":{"Taxonomy":["Root","cellular organisms","Bacteria","Chloroflexi"]}},{"id":"200938","metadata":{"Taxonomy":["Root","cellular organisms","Bacteria","Chrysiogenetes \u003cphylum\u003e"]}},{"id":"1117","metadata":{"Taxonomy":["Root","cellular organisms","Bacteria","Cyanobacteria"]}},{"id":"200930","metadata":{"Taxonomy":["Root","cellular organisms","Bacteria","Deferribacteres \u003cphylum\u003e"]}},{"id":"1297","metadata":{"Taxonomy":["Root","cellular organisms","Bacteria","Deinococcus-Thermus"]}},{"id":"48479","metadata":{"Taxonomy":["Root","cellular organisms","Bacteria","environmental samples \u003cBacteria\u003e"]}},{"id":"131550","metadata":{"Taxonomy":["Root","cellular organisms","Bacteria","Fibrobacteres/Acidobacteria group"]}},{"id":"57723","metadata":{"Taxonomy":["Root","cellular organisms","Bacteria","Fibrobacteres/Acidobacteria group","Acidobacteria"]}},{"id":"65842","metadata":{"Taxonomy":["Root","cellular organisms","Bacteria","Fibrobacteres/Acidobacteria group","Fibrobacteres"]}},{"id":"62680","metadata":{"Taxonomy":["Root","cellular organisms","Bacteria","Fibrobacteres/Acidobacteria group","Marinimicrobia"]}},{"id":"1239","metadata":{"Taxonomy":["Root","cellular organisms","Bacteria","Firmicutes"]}},{"id":"32066","metadata":{"Taxonomy":["Root","cellular organisms","Bacteria","Fusobacteria"]}},{"id":"142182","metadata":{"Taxonomy":["Root","cellular organisms","Bacteria","Gemmatimonadetes"]}},{"id":"1293497","metadata":{"Taxonomy":["Root","cellular organisms","Bacteria","Nitrospinae"]}},{"id":"40117","metadata":{"Taxonomy":["Root","cellular organisms","Bacteria","Nitrospirae"]}},{"id":"203682","metadata":{"Taxonomy":["Root","cellular organisms","Bacteria","Planctomycetes"]}},{"id":"1224","metadata":{"Taxonomy":["Root","cellular organisms","Bacteria","Proteobacteria"]}},{"id":"203691","metadata":{"Taxonomy":["Root","cellular organisms","Bacteria","Spirochaetes"]}},{"id":"508458","metadata":{"Taxonomy":["Root","cellular organisms","Bacteria","Synergistetes"]}},{"id":"544448","metadata":{"Taxonomy":["Root","cellular organisms","Bacteria","Tenericutes"]}},{"id":"200940","metadata":{"Taxonomy":["Root","cellular organisms","Bacteria","Thermodesulfobacteria \u003cphylum\u003e"]}},{"id":"200918","metadata":{"Taxonomy":["Root","cellular organisms","Bacteria","Thermotogae \u003cphylum\u003e"]}},{"id":"2323","metadata":{"Taxonomy":["Root","cellular organisms","Bacteria","unclassified Bacteria"]}},{"id":"67810","metadata":{"Taxonomy":["Root","cellular organisms","Bacteria","unclassified Bacteria","Acetothermia"]}},{"id":"1052815","metadata":{"Taxonomy":["Root","cellular organisms","Bacteria","unclassified Bacteria","Aerophobetes"]}},{"id":"67817","metadata":{"Taxonomy":["Root","cellular organisms","Bacteria","unclassified Bacteria","Aminicenantes"]}},{"id":"67818","metadata":{"Taxonomy":["Root","cellular organisms","Bacteria","unclassified Bacteria","Atribacteria"]}},{"id":"187144","metadata":{"Taxonomy":["Root","cellular organisms","Bacteria","unclassified Bacteria","Caldithrix"]}},{"id":"200295","metadata":{"Taxonomy":["Root","cellular organisms","Bacteria","unclassified Bacteria","candidate division BRC1"]}},{"id":"640293","metadata":{"Taxonomy":["Root","cellular organisms","Bacteria","unclassified Bacteria","candidate division NC10"]}},{"id":"1379697","metadata":{"Taxonomy":["Root","cellular organisms","Bacteria","unclassified Bacteria","candidate division Zixibacteria"]}},{"id":"456828","metadata":{"Taxonomy":["Root","cellular organisms","Bacteria","unclassified Bacteria","Cloacimonetes"]}},{"id":"1383058","metadata":{"Taxonomy":["Root","cellular organisms","Bacteria","unclassified Bacteria","Fervidibacteria"]}},{"id":"74015","metadata":{"Taxonomy":["Root","cellular organisms","Bacteria","unclassified Bacteria","Latescibacteria"]}},{"id":"221216","metadata":{"Taxonomy":["Root","cellular organisms","Bacteria","unclassified Bacteria","Parcubacteria"]}},{"id":"265317","metadata":{"Taxonomy":["Root","cellular organisms","Bacteria","unclassified Bacteria","Poribacteria"]}},{"id":"262406","metadata":{"Taxonomy":["Root","cellular organisms","Bacteria","unclassified Bacteria","Thermobaculum"]}},{"id":"49928","metadata":{"Taxonomy":["Root","cellular organisms","Bacteria","unclassified Bacteria","unclassified Bacteria (miscellaneous)"]}},{"id":"2157","metadata":{"Taxonomy":["Root","cellular organisms","Archaea"]}},{"id":"28889","metadata":{"Taxonomy":["Root","cellular organisms","Archaea","Crenarchaeota"]}},{"id":"48510","metadata":{"Taxonomy":["Root","cellular organisms","Archaea","environmental samples \u003cArchaea\u003e"]}},{"id":"28890","metadata":{"Taxonomy":["Root","cellular organisms","Archaea","Euryarchaeota"]}},{"id":"651137","metadata":{"Taxonomy":["Root","cellular organisms","Archaea","Thaumarchaeota"]}},{"id":"29294","metadata":{"Taxonomy":["Root","cellular organisms","Archaea","unclassified Archaea"]}},{"id":"93506","metadata":{"Taxonomy":["Root","cellular organisms","Archaea","unclassified Archaea","unclassified Archaea (miscellaneous)"]}},{"id":"2759","metadata":{"Taxonomy":["Root","cellular organisms","Eukaryota"]}},{"id":"33630","metadata":{"Taxonomy":["Root","cellular organisms","Eukaryota","Alveolata"]}},{"id":"5794","metadata":{"Taxonomy":["Root","cellular organisms","Eukaryota","Alveolata","Apicomplexa"]}},{"id":"5878","metadata":{"Taxonomy":["Root","cellular organisms","Eukaryota","Alveolata","Ciliophora"]}},{"id":"27997","metadata":{"Taxonomy":["Root","cellular organisms","Eukaryota","Alveolata","Perkinsea"]}},{"id":"554915","metadata":{"Taxonomy":["Root","cellular organisms","Eukaryota","Amoebozoa"]}},{"id":"3027","metadata":{"Taxonomy":["Root","cellular organisms","Eukaryota","Cryptophyta"]}},{"id":"33682","metadata":{"Taxonomy":["Root","cellular organisms","Eukaryota","Euglenozoa"]}},{"id":"5653","metadata":{"Taxonomy":["Root","cellular organisms","Eukaryota","Euglenozoa","Kinetoplastida"]}},{"id":"2830","metadata":{"Taxonomy":["Root","cellular organisms","Eukaryota","Haptophyceae"]}},{"id":"33154","metadata":{"Taxonomy":["Root","cellular organisms","Eukaryota","Opisthokonta"]}},{"id":"28009","metadata":{"Taxonomy":["Root","cellular organisms","Eukaryota","Opisthokonta","Choanoflagellida"]}},{"id":"4751","metadata":{"Taxonomy":["Root","cellular organisms","Eukaryota","Opisthokonta","Fungi"]}},{"id":"451864","metadata":{"Taxonomy":["Root","cellular organisms","Eukaryota","Opisthokonta","Fungi","Dikarya"]}},{"id":"4890","metadata":{"Taxonomy":["Root","cellular organisms","Eukaryota","Opisthokonta","Fungi","Dikarya","Ascomycota"]}},{"id":"5204","metadata":{"Taxonomy":["Root","cellular organisms","Eukaryota","Opisthokonta","Fungi","Dikarya","Basidiomycota"]}},{"id":"112252","metadata":{"Taxonomy":["Root","cellular organisms","Eukaryota","Opisthokonta","Fungi","Fungi incertae sedis"]}},{"id":"214504","metadata":{"Taxonomy":["Root","cellular organisms","Eukaryota","Opisthokonta","Fungi","Glomeromycota"]}},{"id":"6029","metadata":{"Taxonomy":["Root","cellular organisms","Eukaryota","Opisthokonta","Fungi","Microsporidia"]}},{"id":"33208","metadata":{"Taxonomy":["Root","cellular organisms","Eukaryota","Opisthokonta","Metazoa"]}},{"id":"6072","metadata":{"Taxonomy":["Root","cellular organisms","Eukaryota","Opisthokonta","Metazoa","Eumetazoa"]}},{"id":"33213","metadata":{"Taxonomy":["Root","cellular organisms","Eukaryota","Opisthokonta","Metazoa","Eumetazoa","Bilateria"]}},{"id":"33511","metadata":{"Taxonomy":["Root","cellular organisms","Eukaryota","Opisthokonta","Metazoa","Eumetazoa","Bilateria","Deuterostomia"]}},{"id":"7711","metadata":{"Taxonomy":["Root","cellular organisms","Eukaryota","Opisthokonta","Metazoa","Eumetazoa","Bilateria","Deuterostomia","Chordata"]}},{"id":"7586","metadata":{"Taxonomy":["Root","cellular organisms","Eukaryota","Opisthokonta","Metazoa","Eumetazoa","Bilateria","Deuterostomia","Echinodermata"]}},{"id":"10219","metadata":{"Taxonomy":["Root","cellular organisms","Eukaryota","Opisthokonta","Metazoa","Eumetazoa","Bilateria","Deuterostomia","Hemichordata"]}},{"id":"6157","metadata":{"Taxonomy":["Root","cellular organisms","Eukaryota","Opisthokonta","Metazoa","Eumetazoa","Bilateria","Platyhelminthes"]}},{"id":"33317","metadata":{"Taxonomy":["Root","cellular organisms","Eukaryota","Opisthokonta","Metazoa","Eumetazoa","Bilateria","Protostomia"]}},{"id":"1206794","metadata":{"Taxonomy":["Root","cellular organisms","Eukaryota","Opisthokonta","Metazoa","Eumetazoa","Bilateria","Protostomia","Ecdysozoa"]}},{"id":"6231","metadata":{"Taxonomy":["Root","cellular organisms","Eukaryota","Opisthokonta","Metazoa","Eumetazoa","Bilateria","Protostomia","Ecdysozoa","Nematoda"]}},{"id":"88770","metadata":{"Taxonomy":["Root","cellular organisms","Eukaryota","Opisthokonta","Metazoa","Eumetazoa","Bilateria","Protostomia","Ecdysozoa","Panarthropoda"]}},{"id":"6656","metadata":{"Taxonomy":["Root","cellular organisms","Eukaryota","Opisthokonta","Metazoa","Eumetazoa","Bilateria","Protostomia","Ecdysozoa","Panarthropoda","Arthropoda"]}},{"id":"1206795","metadata":{"Taxonomy":["Root","cellular organisms","Eukaryota","Opisthokonta","Metazoa","Eumetazoa","Bilateria","Protostomia","Lophotrochozoa"]}},{"id":"6340","metadata":{"Taxonomy":["Root","cellular organisms","Eukaryota","Opisthokonta","Metazoa","Eumetazoa","Bilateria","Protostomia","Lophotrochozoa","Annelida"]}},{"id":"6447","metadata":{"Taxonomy":["Root","cellular organisms","Eukaryota","Opisthokonta","Metazoa","Eumetazoa","Bilateria","Protostomia","Lophotrochozoa","Mollusca"]}},{"id":"6073","metadata":{"Taxonomy":["Root","cellular organisms","Eukaryota","Opisthokonta","Metazoa","Eumetazoa","Cnidaria"]}},{"id":"10226","metadata":{"Taxonomy":["Root","cellular organisms","Eukaryota","Opisthokonta","Metazoa","Placozoa"]}},{"id":"6040","metadata":{"Taxonomy":["Root","cellular organisms","Eukaryota","Opisthokonta","Metazoa","Porifera"]}},{"id":"42461","metadata":{"Taxonomy":["Root","cellular organisms","Eukaryota","Opisthokonta","Opisthokonta incertae sedis"]}},{"id":"5719","metadata":{"Taxonomy":["Root","cellular organisms","Eukaryota","Parabasalia"]}},{"id":"2763","metadata":{"Taxonomy":["Root","cellular organisms","Eukaryota","Rhodophyta"]}},{"id":"33634","metadata":{"Taxonomy":["Root","cellular organisms","Eukaryota","Stramenopiles"]}},{"id":"2836","metadata":{"Taxonomy":["Root","cellular organisms","Eukaryota","Stramenopiles","Bacillariophyta"]}},{"id":"4762","metadata":{"Taxonomy":["Root","cellular organisms","Eukaryota","Stramenopiles","Oomycetes"]}},{"id":"35675","metadata":{"Taxonomy":["Root","cellular organisms","Eukaryota","Stramenopiles","Pelagophyceae"]}},{"id":"569578","metadata":{"Taxonomy":["Root","cellular organisms","Eukaryota","Stramenopiles","PX clade"]}},{"id":"2870","metadata":{"Taxonomy":["Root","cellular organisms","Eukaryota","Stramenopiles","PX clade","Phaeophyceae"]}},{"id":"33090","metadata":{"Taxonomy":["Root","cellular organisms","Eukaryota","Viridiplantae"]}},{"id":"3041","metadata":{"Taxonomy":["Root","cellular organisms","Eukaryota","Viridiplantae","Chlorophyta"]}},{"id":"35493","metadata":{"Taxonomy":["Root","cellular organisms","Eukaryota","Viridiplantae","Streptophyta"]}},{"id":"10239","metadata":{"Taxonomy":["Root","Viruses"]}},{"id":"12908","metadata":{"Taxonomy":["Root","unclassified sequences"]}},{"id":"28384","metadata":{"Taxonomy":["Root","other sequences"]}}],"columns":[{"id":"4G2_r1"},{"id":"4G3_r1-1"},{"id":"4G2_r1"}],"matrix_type":"dense","matrix_element_type":"int","shape":[112,3],"data":[[1973,7459,1973],[9951,830,9951],[9572,1826,9572],[718,217,718],[67,2,67],[11,0,11],[5,0,5],[183,14,183],[6,0,6],[4,0,4],[2,0,2],[17,19,17],[3,0,3],[79,0,79],[359,1,359],[1,0,1],[1272,8,1272],[1,0,1],[20,0,20],[263,3,263],[1,0,1],[325,3,325],[1,0,1],[14,0,14],[2850,141,2850],[208,0,208],[17,0,17],[6,0,6],[129,0,129],[87,2,87],[9750,876,9750],[66,2,66],[1,0,1],[108,0,108],[2,0,2],[119,3,119],[16,0,16],[3,0,3],[2,0,2],[3,0,3],[2,0,2],[4,0,4],[3,0,3],[7,0,7],[26,0,26],[1,0,1],[1,0,1],[8,0,8],[1,0,1],[1956,9,1956],[3,0,3],[12,0,12],[24,0,24],[5,0,5],[1,0,1],[62,0,62],[627,2,627],[0,0,0],[2,0,2],[7149,16948,7149],[0,0,0],[2,463,2],[344,2,344],[2,0,2],[2,2,2],[22,0,22],[0,0,0],[1,25,1],[2,1,2],[1924,64,1924],[1,6,1],[9,0,9],[393,0,393],[244,3,244],[337,0,337],[6,0,6],[9,0,9],[27,0,27],[363,4,363],[97,5,97],[226,180,226],[19,3,19],[221,27648,221],[4,0,4],[15,3,15],[246,8,246],[44,3,44],[27,0,27],[341,30,341],[0,0,0],[299,14,299],[6,1,6],[20,6,20],[45,8,45],[224,5,224],[43,0,43],[1020,2,1020],[4,0,4],[1,0,1],[12,0,12],[1,0,1],[3,2,3],[6,0,6],[1,0,1],[0,0,0],[23,0,23],[31,0,31],[19,5,19],[1509,29,1509],[9,42,9],[21,0,21],[17,4,17]]}

joey711 commented 9 years ago

Nice, well, you've more-or-less diagnosed the issue. One of the following required keys is missing.

id format format_url type generated_by date rows columns matrix_type matrix_element_type shape data

They would be written in the file. If you can do a quick "find" search for each required key and post back here which are missing, then we can discuss whether this package should require all those keys or not. I suspect MEGAN has some kind of reason for omitting a key, but first let's figure out which ones.

Cheers

joey

bjpeleka commented 9 years ago

The key missing is 'format_url'. In the file there is 'url' which I change to 'format_url' and read the file again. This time I got the following error messages: "Error in i$id : $ operator is invalid for atomic vectors".

I also noticed that data is arranged to a biom that I have successfully read before but I'm not sure if this matters.

guyhorev commented 8 years ago

I encountered a similar problem.

Guy

joey711 commented 7 years ago

If this is still a problem, please post on the biomformat repo issue tracker. Release version at:

https://www.bioconductor.org/packages/3.3/bioc/html/biomformat.html

Repo at: https://github.com/joey711/biomformat