knights-lab / SHOGUN

SHallow shOtGUN profiler
GNU Affero General Public License v3.0
54 stars 19 forks source link

Shogun function by using kraken & Bracken output #45

Open luzhang321 opened 9 months ago

luzhang321 commented 9 months ago

Hi:)

Hello, I am very interested in the tool SHOGUN and want to use SHOGUN for functional annotation. However, I met some problems because my taxonomy is made with Kraken2 and bracken2. 1). I want to ask if my functional annotation can be done in SHOGUN with a species taxonomy table from kraken2 and bracken2? 2). I found that shogun has two functional annotations, one is functional and the other is summarize_functional. Their functions are both predicting functions from taxonomy. I would like to ask you, what are the specific differences? 3). I tried with the taxonomic table in shogun functional, but it returns that not all the species are recorded, only part of my species are overlapped with SHOGUN/ko-species2ko.txt. I only choose the overlapped ones and run this function, it produces some output.

kneaddata_custom_species_mpa_bacteria_modified.kegg.modules.coverage.txt
kneaddata_custom_species_mpa_bacteria_modified.kegg.modules.txt
kneaddata_custom_species_mpa_bacteria_modified.kegg.pathways.coverage.txt
kneaddata_custom_species_mpa_bacteria_modified.kegg.pathways.txt
kneaddata_custom_species_mpa_bacteria_modified.kegg.txt
kneaddata_custom_species_mpa_bacteria_modified.normalized.txt

However when I tried with shogun summarize_functional, the outputs were all empty.

Could you please give me some suggestions on these questions? Thank you very much for your help and I look forward to hearing from you.

Best Regards, Lu

bhillmann commented 9 months ago

Hello, thank you for questions. You will find that a lot of this SHOGUN code is outdated compared to the recent advances in the field. Most of the code is formatted specifically for older versions of NCBI databases (rep82) and KEGG databases. Furthermore, the code is also integrated with older versions of tools. With that said, SHOGUN should work with new tools as long as you update the databases to newer versions and as long as the newer tools are outputting the same format of tables.

  1. In theory, it should work for functional annotation from taxonomy tables of bracken2 and kraken2 as long as they output standard BIOM taxonomy tables. You will also need to update a mapping table from taxonomy to KEGG ids for the functional annotation to work.
  2. functional creates a KEGG table with features being KEGG ids or kos. These kos, usually at the gene level, can be summarized at the pathway and module level. That summarization is done using summarize_functional.
  3. Yes, the code will only work if properly if you have an updated ko-species2ko.txt file. I'm not sure why the shogun summarize_functional code didn't work, but I am assuming that there isn't enough overlap in the kos from and the files
    http://metagenome.cs.umn.edu/public/shogun-db/function/ko-enzyme-annotations.txt
    http://metagenome.cs.umn.edu/public/shogun-db/function/ko-module-annotations.txt
    http://metagenome.cs.umn.edu/public/shogun-db/function/ko-pathway-annotations.txt
luzhang321 commented 9 months ago

Dear Hillmann,

Thanks so much for the quick reply. There are still a few things I am not entirely sure about.

Currently, I use the R KEGGREST package: keggLink("ko", bacteria_id$T.number)function to obtain the ko and KEGG species T number information. Then use the taxid and name of the corresponding T number recorded in kegg info to get the name of the species. Is this the correct way to obtain this ko-species information? Could I build the updated database as you did at the beginning?

The reason I wanted to use SHOGUN is that my data is not deep sequencing. So I think it's important for me to figure the current questions. I appreciate your help!

bhillmann commented 9 months ago

You'd have to build the ko database yourself; I'm unfamiliar with the package KEGGREST, but it should theoretically work. All you need is a mapping from your lowest level taxonomic annotation, perhaps species here, to the ko and copy number count.

I see that our database links are no longer working. Let me investigate that.

luzhang321 commented 9 months ago

Hi :) Thanks so much for your help. Yes, I am aware that I need to build the database by myself. I only want to ensure the species and KO information is obtained from the KEGG database. Because I noticed that in the shogun paper, you mentioned 'We identified genes using UniProt (Bateman et al., 2017) annotations obtained by running Prokka (Seemann, 2014) on all the bacterial genomes and mapping them to Kyoto Encyclopedia of Genes and Genomes (Kanehisa et al., 2012) annotations.' So I am wondering whether you downloaded the genomes and annotate by yourself or just acquire the info directly from kegg database.

I am looking forward to your feedback on the database links. Thanks again!