steineggerlab / foldseek

Foldseek enables fast and sensitive comparisons of large structure sets.
https://foldseek.com
GNU General Public License v3.0
695 stars 92 forks source link

How is the PDB100 database prepared? #217

Closed BinhongLiu closed 4 months ago

BinhongLiu commented 6 months ago

If I didn't get it wrong, the PDB100 database was built based on 100% sequence identity clustered PDB. I checked the pdb.lookup file, which supposedly contains all the pdb_chain IDs, and found some strange chain IDs were included, like 1a0n_MODEL_1_B , 1a0n_MODEL_2_B and 1a0n_MODEL_3_B. I could not find the corresponding chain that named this from the 1a0n from PDB. And what is the difference between these 1a0n_MODEL_*_B chains?

I'll be much appreciate if you could help me with this problem. Many thanks.

milot-mirdita commented 6 months ago

The script to create the PDB100 is here: https://github.com/steineggerlab/foldseek/blob/master/util/update_webserver_pdb/single-script.sh Not sure what the weird chains are, I have to look closer into it, these might be weird NMR structures that we don't handle correctly.