microsoft / protein-frame-flow

Fast protein backbone generation with SE(3) flow matching.
MIT License
200 stars 13 forks source link

Questtion about novelty calculation #14

Closed Wangchentong closed 8 months ago

Wangchentong commented 9 months ago

Thanks for you authors share this great work, i wonder how the pdb dataset is curated for the calculation of novelty? do you split each chain in the pdb database or just based on the training single chain or some thing else?

Wangchentong commented 9 months ago

i guess you use default foldseek PDBdatabse?

jasonkyuyim commented 9 months ago

Hi, we take the whole PDB dataset as instructed in foldseek's documentation. I believe this is all the single sequences. We use the following flags to run novelty calculations.

-alignment-type 1 --format-output query,target,alntmscore,lddt --tmscore-threshold 0.0 --exhaustive-search --max-seqs 10000000000
Wangchentong commented 8 months ago

Thanks jason!