How to acquire the KEGG KO information file?

zx0223winner / HSDFinder

a tool to predict highly similar duplicates (HSDs) in eukaryotes

MIT License

3 stars 1 forks source link

How to acquire the KEGG KO information file? #3

Open zx0223winner opened 2 years ago

zx0223winner commented 2 years ago

Here is the enquiry email sending from a current user which might help those who have similar concerns.

Hi I tested for one file, generated the ##.species.txt from HSFinder.py.

How do I generate the ##.species_ko.txt ?

Creating Heatmap ?

zx0223winner commented 2 years ago

Hi Glad you have worked it out. The KEGG KO file is acquired from KEGG KO BLAST engine https://www.kegg.jp/ghostkoala/ , which is pretty straightforward to use. Simply submit the protein data and the email to receive file, you will receive the KO file. I also detailed and demonstrate the steps in HSDFinder tutorial, Please find the Step 6 from the link: https://github.com/zx0223winner/HSDFinder/blob/master/Tutorial/Tutorial%20for%20HSDFinder.pdf

g10.t1 K07566 g11.t1 g12.t1 g13.t1 g14.t1 g15.t1 K09481 g16.t1 K00472

Once you have the KO file, you can either compare different thresholds of HSDs in one species in a heatmap or HSDs from different species (if you have respective HSD result file and KO file) in a heatmap (examples attached).

~Xi

zx0223winner commented 2 years ago

Can't we do it by command line version in KEGG ? I had a min of 50,000 protein sequences in each genome. Don't we have access to the command line version of KEGG KO BLAST to speed up the process ?

Did you mean the heatmap or the KO file? The KO file seems can only be acquired from KEGG. If you worried about the speed of online heatmap option in HSFinder web server ( if you have tried, it is actually not that slow. e.g., 10 mins for human genome). I can send you the heatmap script but you might have to be comfortable with command lines environment. ~Xi

zx0223winner commented 2 years ago

I can only submit one job at a time to the KEGG KO BLAST Its very slow, maybe it's because of the big data set perhaps I think.

You can submit KEGG jobs with different emails. It is slow but definitely worth it and is the necessary input file. I could not find an easier way to do it so far.