micom-dev / databases

Workflows and input data for the construction of the standard MICOM model databases.
Apache License 2.0
2 stars 1 forks source link

detailed description on genome scale metabolic model database #2

Open Zhaoju-Deng opened 5 months ago

Zhaoju-Deng commented 5 months ago

Hi Diener, would it possible to provide detailed description on the procedures that generated those GEM reference database? I found it quite useful while I am hesitating to use it in my manuscript without any description on how those GEM reference database were produced.

many thanks, Zhaoju

cdiener commented 5 months ago

Agreed, that would be good. I will work on adding some documentation.

Those are all built from Nextflow pipelines that are provided in the recipes folder and contain everything to build them from scratch. For now you could go through those to see what exactly is happening. For instance for AGORA2. After that they get uploaded with the release script to Zenodo.

Zhaoju-Deng commented 5 months ago

thanks very much for your quick response! would it possible to provide a short description on the carveme reference database, I am trying to use it, since my microbiome data from cow, therefore agora2 database is not suitable for my analysis.

cdiener commented 5 months ago

Those are just the models from the original CARVEME publication. Those were all bacteria in Refseq at the point. The only thing I did is go through the taxonomy IDs and update them to more recent versions with taxonkit (corresponding to the RefSeq release because the NCBI taxonomy itself does not really have releases). They are pretty old by this point (~5 years), so there might be more genomes those days. Alternatively you could build your own database either using carveme or gapseq. There are good genome catalogues for the rumen microbiome. I guess the medium would be another issue though.

Zhaoju-Deng commented 5 months ago

many thanks for you instant reply! I always wondering in the database, only one GEM for one bacterial species, while in the NCBI genome RefSeq database, there are multiple reference (or representative) genomes corresponding to each bacterial species (https://ftp.ncbi.nlm.nih.gov/genomes/ASSEMBLY_REPORTS/assembly_summary_refseq.txt), would it possible to share how the selection of the single one genome per bacterial species that used to construct GEM using Carveme was performed (or there exist a non-reduandant reference genome database?)? I can use Carveme to reconstruct the GEMs for each single genome per species. The samples are all fecal samples, the culture medium indeed is another issue, I am just thinking to simulate in the minimum media (and also in other media to test if results from different media differ significantly for those bacterial species of interests?), any other suggestions are also welcome!

cdiener commented 5 months ago

The list of genomes can be found in the associated Github repo. There should be only one representative genome for each species in Refseq.

Zhaoju-Deng commented 5 months ago

true, I previously contacted Dr. Machado specifically on how they mapped 16s amplicon sequences against the reference genome database (those were not non-reduandant reference genome database, one bacterial species had multiple reference/representative genomes, they used diamond algorithm to blast 16s amplicon sequence against to the reference genome database and used cutoff to filter the "best" match to 16s amplicon sequence, but Dr. Machado told me this part of analysis was done by Dr. Yongkyu Kim, however, I contacted with dr. Yongkyu Kim and also Prof. Kiran R. Patil but with no response) in their paper "Polarization of microbial communities between competitive and cooperative metabolism" to retrieve genomes for each bacterial species in each 16s microbiota sample. I followed their method, but the results should very low identity score and coverage% (I hardly had any samples with >97% identity score &95% coverage%,they used 97%identity and 95% coverage to filter best hits). that's why I consistantly asking if there exsit a reference database contains only single ref/representative genome for each bacterial species. the github repo you mentioned only contains 5587 models, while in the there are almost ~50k bacterial species in NCBI reference database, so I am trying to reconstruct GEMs by myself using CarveMe, but I was stuck at choosing the best genome for each species that could represent the bacterial species in 16s amplicon microbiota, it would be great if you have any suggestions? many thanks!

cdiener commented 5 months ago

Oh yeah, what I meant is that there is usually (few exceptions) only one reference genome for one species in RefSeq (refseq_category column in the assembly summary). So if you would filter by this you would get something very close to single reference db.