knights-lab / BugBase

35 stars 12 forks source link

Bugbase says incorrect biom file format #13

Open sandipansamaddar opened 4 years ago

sandipansamaddar commented 4 years ago

Hi, I used dada2 for analyzing the 16S dataset and now I am trying to predict high-level phenotypes in the data using Bugbase. But while outputting the biom file I am facing some problems. I am not sure what is going wrong but any help will be appreciated.

Bugbase documentation says it needs a OTU table in BIOM format (version 1.0, JSON) picked against a GG database. (https://bugbase.cs.umn.edu/documentation.html)

I used GG database to classify sequences and created a biom file which looks fine to me but its not working and it says not a correct format.

The commands used to generate the biom file were :

In R using biomformat package:

biom16S <- make_biom(count_tab16s )
biom_file <- biom16S
outfile = tempfile()
write_biom(biom_file, outfile)

where count_tab16S is the count file with ASV IDS in rows and sample names in columns.

After getting the output I added the metadata in terminal using the command:

biom add-metadata -i my.biom -o table-with-taxonomy.biom --observation-metadata-fp taxonomyfile.tsv --sc-separated Taxonomy

where taxonomyfile.tsv is the file with GG taxonomy information.

After getting the output from the above step I converted the file to JSON:

biom convert --table-type="OTU table" -i table-with-taxonomy.biom -o your_OTU_table.txt --to-tsv --header-key taxonomy
biom convert -i your_OTU_table.txt -o OTU_table.biom --table-type="OTU table" --to-json --process-obs-metadata taxonomy

I used the output in Bugbase but it didnt work.

But when I validated the file using: biom validate-table -i OTU_table.biom

it says its a correctly formatted file.

On the other hand, I also used Mothur to process my sequences and created biom file using make.biom command and converted file to json the same way I did above but it worked with Bugbase.

So I am afraid whats going wrong.

I will also be happy so share any files if needed.

Thanks,

Sandipan

TonyaWard commented 4 years ago

Hi Sandipan,

Can you please share the files?

Thanks!


From: Sandipan Samaddar notifications@github.com Sent: Wednesday, July 8, 2020 2:56 PM To: knights-lab/BugBase BugBase@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [knights-lab/BugBase] Bugbase says incorrect biom file format (#13)

Hi, I used dada2 for analyzing the 16S dataset and now I am trying to predict high-level phenotypes in the data using Bugbase. But while outputting the biom file I am facing some problems. I am not sure what is going wrong but any help will be appreciated.

Bugbase documentation says it needs a OTU table in BIOM format (version 1.0, JSON) picked against a GG database. (https://bugbase.cs.umn.edu/documentation.html)

I used GG database to classify sequences and created a biom file which looks fine to me but its not working and it says not a correct format.

The commands used to generate the biom file were :

In R using biomformat package:

biom16S <- make_biom(count_tab16s ) biom_file <- biom16S outfile = tempfile() write_biom(biom_file, outfile)

where count_tab16S is the count file with ASV IDS in rows and sample names in columns.

After getting the output I added the metadata in terminal using the command:

biom add-metadata -i my.biom -o table-with-taxonomy.biom --observation-metadata-fp taxonomyfile.tsv --sc-separated Taxonomy

where taxonomyfile.tsv is the file with GG taxonomy information.

After getting the output from the above step I converted the file to JSON:

biom convert --table-type="OTU table" -i table-with-taxonomy.biom -o your_OTU_table.txt --to-tsv --header-key taxonomy biom convert -i your_OTU_table.txt -o OTU_table.biom --table-type="OTU table" --to-json --process-obs-metadata taxonomy

I used the output in Bugbase but it didnt work.

But when I validated the file using: biom validate-table -i OTU_table.biom

it says its a correctly formatted file.

On the other hand, I also used Mothur to process my sequences and created biom file using make.biom command and converted file to json the same way I did above but it worked with Bugbase.

So I am afraid whats going wrong.

I will also be happy so share any files if needed.

Thanks,

Sandipan

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/knights-lab/BugBase/issues/13, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AB5NUAWM72TMS4Q64YBHB6LR2TFQVANCNFSM4OU42ODA.

sandipansamaddar commented 4 years ago

Hi Tonya, The files are attached. I am attaching two zip folders.

One has outputs from Mothur which works fine with bugbase and the other folder has outputs from dada2.

I also would like to ask, as I am working with soil samples how well do you think Bugbase will be able to predict the pathogenic potential? Because when I checked the results from mothur and looked at contributing OTUs some of them which were unclassified at Family, Genus or species level according to GG were considered as pathogen by Bugbase.

Any help will be appreciated.

Thanks,

Sandipan Bugbase_mothur.zip

Bugbase_dada2.zip

TonyaWard commented 4 years ago

Hi Sandipan,

When you look at the OTU identifiers from Dada2 you'll see that they are ASVs labeled with the order or occurrence (ASV_1, ASV_2.. etc). The Mother outputs have the OTU identifier (109057, 109058 .. etc). BugBase uses the exact GreenGenes OTU identifiers for the prediction of traits. Therefore, it cannot be used with Dada2 outputs. I'm not exactly sure if the OTU calling in mothur retains the exact OTU identifier from GreenGenes, if it does then the predictions for those OTUs are accurate to the level that BugBase can predict.

For taxa in the mothur outputs that have a limited taxonomy designated, their pathogenicity predictions (and other traits) are actually derived from the OTU identifier and the nearest neighbors to that OTU in the tree. Therefore the taxonomy might not be well defined but based on the 16S sequence we expect them to the related to OTUs that are pathogenic.

I hope this helps!

Tonya


From: Sandipan Samaddar notifications@github.com Sent: Wednesday, July 8, 2020 4:19 PM To: knights-lab/BugBase BugBase@noreply.github.com Cc: Tonya Ward tward@diversigen.com; Comment comment@noreply.github.com Subject: Re: [knights-lab/BugBase] Bugbase says incorrect biom file format (#13)

Hi Tonya, The files are attached. I am attaching two zip folders.

One has outputs from Mothur which works fine with bugbase and the other folder has outputs from dada2.

I also would like to ask, as I am working with soil samples how well do you think Bugbase will be able to predict the pathogenic potential? Because when I checked the results from mothur and looked at contributing OTUs some of them which were unclassified at Family, Genus or species level according to GG were considered as pathogen by Bugbase.

Any help will be appreciated.

Thanks,

Sandipan Bugbase_mothur.ziphttps://github.com/knights-lab/BugBase/files/4893310/Bugbase_mothur.zip

Bugbase_dada2.ziphttps://github.com/knights-lab/BugBase/files/4893276/Bugbase_dada2.zip

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/knights-lab/BugBase/issues/13#issuecomment-655763543, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AB5NUAQL32VKYOK6S6JFFMLR2TPHRANCNFSM4OU42ODA.