westlake-repl / SaProt

[ICLR'24 spotlight] Saprot: Protein Language Model with Structural Alphabet
MIT License
323 stars 32 forks source link

PDB Validation set for figure 2 #61

Open alex-hh opened 5 days ago

alex-hh commented 5 days ago

Hi,

I'm curious about the comparison between performance on pdb and afdb validation sets. How was the pdb validation set constructed? Did you just filter uniprot ids in the existing validation set having pdb entries?

Thanks!

LTEnjoy commented 5 days ago

Hi,

Yes. Because only partial protein structures in AlphaFoldDB have corresponding PDB entries. We filtered out all proteins without PDB entries.