picrust / picrust2

Code, unit tests, and tutorials for running PICRUSt2
GNU General Public License v3.0
328 stars 104 forks source link

KEGG/EC Database Question #361

Closed raeshrode closed 2 months ago

raeshrode commented 2 months ago

Hello,

This is likely a naïve question however I'm not finding a straight forward answer on picrst2 github or wiki page so I thought I would just ask.

I've had picrust2 downloaded (and runs amazingly - thank you for this tool!) for a couple years (version 2.2.0_b). My question is does picrust2 reach directly out to KEGG/EC databases to identify pathways or is a database downloaded at the time of install? And if the latter is the case, would simply updating to the newest version of picrust2 also update those databases? And then further, if v2.5.3 is out for a couple years, would there be a way to update the KEGG/EC database that was initially downloaded?

Thank you! Rachel

R-Wright-1 commented 2 months ago

Hi Rachel,

I love questions that I can easily answer like this! PICRUSt2 relies on genomes that were annotated by JGI/IMG, and as far as I am aware, they don't provide updates to this. I did for a while look into whether there was a straightforward way to obtain new annotations, e.g. NCBI provides annotations, but it appears that most online repositories will annotate genomes at the time of upload and then they don't seem to be re-annotated. So the only way to update the annotations is to re-annotate the genomes yourself. I am currently (and have been for a little while) working on using the GTDB database genomes (these have the benefit that there is already a phylogenetic tree built for each release, and the tree is based on more than just the 16S sequences) and annotating them as a way to potentially update the database more often. I currently have all of the genomes annotated and just need to re-run some of the analyses that Gavin did for the PICRUSt2 paper to see how well these work. If everything looks good there, I'd probably release them to the community for further testing.

However, if you wanted to use the existing pipeline and just update the annotations, you could re-annotate the genomes (I sourced then and made them available on Figshare here a couple of years ago). And if there is a particular function that you are interested in, you could make a HMM for that function. I did make some instructions for that here. I haven't looked at these for a while, so if this is something you're interested in and something doesn't make sense, please let me know!

Robyn

raeshrode commented 2 months ago

Hi Robyn,

Thank you for quick and helpful reply! It's great you are working on something to help with re-annotation, it's unfortunate NCBI does not re-annotate. I hope your re-run of analyses goes well, and then we can use your annotated genomes.

I'll check out Figshare, that seems to be the best current option. Thanks for that as well!

Rachel

R-Wright-1 commented 2 months ago

No problem! I've also added this to the FAQ now, as it has come up a few times.

Robyn