Closed zzalzzu closed 2 years ago
Hey @zzalzzu,
The KEGG mapping files cannot be distributed except for the last open-source version, but you can get them from the KEGG API using commands like this:
wget http://rest.kegg.jp/list/pathway
mv pathway KEGG_pathway_descrip.tsv
wget http://rest.kegg.jp/link/pathway/genome
mv genome KEGG_genome_pathway_links.tsv
wget http://rest.kegg.jp/list/genome
mv genome KEGG_genome_descrip.tsv
wget http://rest.kegg.jp/list/module
mv module KEGG_module_descrip.tsv
wget http://rest.kegg.jp/link/module/genome
mv genome KEGG_genome_module_links.tsv
wget http://rest.kegg.jp/list/ko
mv ko KEGG_ko_descrip.tsv
wget http://rest.kegg.jp/link/module/ko
mv ko KEGG_ko_module_links.tsv
wget http://rest.kegg.jp/link/pathway/ko
mv ko KEGG_ko_pathway_links.tsv
You would need to download EC number information from here I believe: https://enzyme.expasy.org I don't know if there's an easier place to get it from
Last, the MetaCyc information was taken from the parsed files created for HUMAnN2. I'm not sure what precise workflow they used to create the reaction to pathway mapping files, which makes it harder to create these with newer ones. However, you could check the latest version of HUMAnN3 for these files and/or look on the MetaCyc website (where you can definitely find pathway descriptions at least).
Cheers,
Gavin
Thank you so much for your help!
I updated the latest description and mapping file, and when I ran it, this error occurred.
" Stopping, because no pathways were identified. This can especially happen when either a test input file with few gene families is input or when gene family regrouping is not done properly. "
Perhaps I think that these three files do not match the current version, so the problem is probably caused.
prokaryotic/16S.txt.gz prokaryotic/ko.txt.gz prokaryotic/ec.txt.gz
Is there any way to get the latest version of these three files? or is there any way to fix the above error?
Sorry for the frequent question. Please reply once more.
Hi @zzalzzu,
Just to clarify - you updated the MetaCyc pathway mapfiles?
Could you paste the first few lines of the new mapfile if so?
You don't want to replace those three files you indicated unless you have a different genome database that you want to use, which would require changing all of the files, including the 16S alignment and tree file.
Cheers,
Gavin
Hi! @gavinmdouglas thanks for replying to my question
Currently, I am looking for a map and description file for metacyc. So I didn't try to apply the new file for metacyc.
KEGG's module and pathway map file were obtained through the path you provided. However, the obtained map files were not sorted, so I applied after sorting by referring to the default file of picrust2. The sorted file looks like the attached picture.
After sorting and applying the map file, the same problem as below occurred.
" Stopping, because no pathways were identified. This can especially happen when either a test input file with few gene families is input or when gene family regrouping is not done properly. "
How can I solve this?
Sorting the file shouldn't matter.
It looks like that mayflie should work. What command did you run? Make sure it matches the command in this FAQ post: https://github.com/picrust/picrust2/wiki/Frequently-Asked-Questions#how-can-i-determine-kegg-pathway-abundances-from-the-predicted-ko-abundances (including the --no_regroup option).
Gavin
Dear Gavin,
I wanted to follow up in this discussion, and ask something related. I would like to use the newest Kegg version because their newest update includes several KOs that might be of importance to my study system. I was able to download the newest files from the KEGG API, following your instructions above. However, I noticed that inside the prokaryotic folder, the file called "ko.txt" is a table that links the species to the KOs, and there the newest KOs are not included. Can I do anything to update this? If not, I don't see how updating the kegg files will allow picrust2 to generate an output different to the output with the default files.
Thanks, Yakshi
Hey @yakshiUPR,
There's no easy way to update those links without re-annotating the genomes yourself and producing a new file. There are pathways that are totally missing in the older version of KEGG though, so those would be picked up without adding new KOs though, but you're right that the missing KOs could definitely mean that certain pathways are less likely to be called as present. In addition, it's important to realize that some KOs change definition between versions too. This should just be a small minority, but it will definitely add some noise to mix KEGG versions.
Sorry I can't be of more help!
All the best,
Gavin
Hello @gavinmdouglas Thanks for offering picrust2
I found that the default files in picrust2 (description, mapping file, etc.) are not the latest version. I want to check the pathway(KO, KEGG module, KEGG pathway, EC, and metacyc) by matching our data. So I wonder how to get the latest version of the files and apply them to picrust2.
Thanks for reading. please answer my question.