soedinglab / hh-suite

Remote protein homology detection suite.
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-3019-7
GNU General Public License v3.0
515 stars 128 forks source link

Request to update the PDB70 database #362

Open Fabian-Bastiaanssen opened 7 months ago

Fabian-Bastiaanssen commented 7 months ago

I have tried to create an updated version of the PDB70 database myself to include the proteins added since march, but because the uniclust30 database isn't updated either, I'd have to recreate that as well and in my attempt to do that I realized I don't have the 20T needed to even pre-filter the database. I'd greatly appreciate if you could find some time to update everything.

hrp1000 commented 7 months ago

Hi

As far as I know (but I could be wrong), there is no funding for this project and there is no-one working on it - last update to the code appears to have been early last year (2022). As a result, I think you could be waiting a long time for an update to the databases; this is a huge problem for all of us who are using the HH-Suite software.

One thing occurs to me - the folk at Deep Mind say that they used HHBlits as part of their work on AlphaFold, and they might have access to resources that could be used to fund this work. Just a thought (maybe they read things here...).

milot-mirdita commented 7 months ago

The MPI Toolkit team is building new databases. They are available here: http://ftp.tuebingen.mpg.de/pub/protevo/toolkit/databases/hhsuite_dbs/

They are also linked in the README and in the download server page.

Additionally we built a new PDB100 with Foldseek instead of HHblits for ColabFold, but it should work without issues with HHblits/HHsearch: https://colabfold.mmseqs.com

We will continue to maintain the PDB100 HHblits database for ColabFold for now.

hrp1000 commented 7 months ago

This is excellent news - but it shouldn't have taken a prod from me to reply to the OP!