Closed qiyifei1 closed 2 weeks ago
Hello,
Did you download the Thermostability dataset from here and use the lmdb data from the foldseek
dir? We have provided the datasets for both ESM-2 and SaProt, named normal
and foldseek
respectively.
If you want to fine-tune SaProt you need to load data from the foldseek
dir.
Got it, thank you.
Hello, the Thermostability dataset seems to contain only protein sequence, but not 3Di sequence. Here is one entry:
So the finetune script 'python scripts/training.py -c config/Thermostability/saprot.yaml' does not use 3Di token information, right?
But in the "AlphaFold2 vs. ESMFold" table, the results apparently use structural information. Is it possible to provide the Thermostability dataset with 3Di tokens?