steineggerlab / foldseek

Foldseek enables fast and sensitive comparisons of large structure sets.
https://foldseek.com
GNU General Public License v3.0
693 stars 91 forks source link

Is it possible to update a DB? #283

Closed ekiefl closed 2 weeks ago

ekiefl commented 3 weeks ago

Hi team,

I am interested in periodically updating a DB.

Let's say one week I make a DB with 2 proteins:

foldseek createdb example/1tim.pdb.gz example/8tim.pdb.gz DB

The next week, I have a third protein I want to add:

foldseek updatedb d1asha_ DB

Does a command like this exist? Obviously it is a trivial task to recreate the DB with foldseek createdb example/1tim.pdb.gz example/8tim.pdb.gz d1asha_ DB, but what if I have a DB with tens or hundreds of millions of proteins? I don't want to re-compute the DB each time.

Thanks in advance for your time :)

milot-mirdita commented 3 weeks ago

You can use concatdbs. However, you have to do this four time on each of the Foldseek db components.

# basedb
foldseek createdb directory-with-pdb-files/ base_db

# later
foldseek createdb new-pdb-files/ new_db

# concat
foldseek concatdbs base_db new_db merged_db
foldseek concatdbs base_db_h new_db_h merged_db_h
foldseek concatdbs base_db_ca new_db_ca merged_db_ca
foldseek concatdbs base_db_ss new_db_ss merged_db_ss
ekiefl commented 2 weeks ago

This is wonderful. Thanks for your help.