steineggerlab / foldseek

Foldseek enables fast and sensitive comparisons of large structure sets.
https://foldseek.com
GNU General Public License v3.0
695 stars 92 forks source link

foldseek easy-cluster iteratively with different batches at different times #249

Open josephhughes opened 4 months ago

josephhughes commented 4 months ago

Hi,

Is it possible to do foldseek easy-cluster at different points in time with different batches without needing to reprocess everything. For example, I have 10,000 pdb files that I clustered today. Then in 3 weeks time, I add another 10,000 sequences to the folder of pdb files. When I run foldseek easy-cluster, is there a way for me to tell it that it can use the results of the first 10,000 files to minimise compute?

CRC63 commented 3 months ago

Hi, I am also interested in this possibility. Thanks in advance.