Closed chao1224 closed 1 year ago
Hi, thanks for your question:
When running downstream tasks, please keep the parameters consistent with Table 6 in the paper. I need to explain that the batch_size
in our paper Table 6 is equal to per_device_batch_size
*gradient_accumulation_steps
. This is because some GPU memory is not enough, which need to adjust per_device_batch_size
and gradient_accumulation_steps
to achieve the same batchsize. As for 'eval_step', I don’t think it will affect the training process, just follow the default in scripts. Protbert use the same hyperparameters as our model.
Hi, thank you for the prompt reply. It's very helpful!
Meanwhile, I carefully check the per_device_batch_size
and gradient_accumulation_steps
(listed below), and it seems that contact and ss3 have some minor mismatch issues when using the default hyperparameters under the script folder. Can you help check this once available?
per_device_batch_size |
gradient_accumulation_steps |
batch-size in Table 6 | |
---|---|---|---|
contact | 1 | 1 | 8 |
fluorescence | 4 | 16 | 64 |
homology | 1 | 64 | 64 |
ss3 | 2 | 8 | 32 |
ss8 | 2 | 16 | 32 |
stability | 2 | 16 | 32 |
您好,感谢您的及时回复。这非常有帮助!
同时,我仔细检查了
per_device_batch_size
andgradient_accumulation_steps
(如下所列),似乎contact和ss3在使用脚本文件夹下的默认超参数时存在一些小的不匹配问题。你能帮忙检查一下吗?
per_device_batch_size
gradient_accumulation_steps
表6中的batch-size 接触 1 1 8 荧光 4 16 64 同源性 1 64 64 SS3 2 8 32 ss8 2 16 32 稳定 2 16 32
Hello: Thank you for your correction. First of all, I apologize to you for my carelessness yesterday. I forgot to mention that the batchsize in Table 6 is also related to the number of GPUs we use. I have corrected and updated our script.
Thanks. I can roughly reproduce the results using the latest scripts (except contact, which is still running).
Hi @zxlzr , I got the following result for OntoProtein on contact, which seems to be too. The paper reports 0.40 for l2. I'm not sure if I miss sth, can you help double-check it?
'accuracy_l5': 0.6621209383010864, 'accuracy_l2': 0.5874732732772827, 'accuracy_l': 0.4826046824455261
Hi @zxlzr , I got the following result for OntoProtein on contact, which seems to be too. The paper reports 0.40 for l2. I'm not sure if I miss sth, can you help double-check it?
'accuracy_l5': 0.6621209383010864, 'accuracy_l2': 0.5874732732772827, 'accuracy_l': 0.4826046824455261
Hi, can you please provide more detailed information? For example, your hyperparameters and the sequence length between amino acids.
Hi there,
I just followed the hyperparameters in this link.
Can you help explain what is the sequence length between amino acids
?
sequence length between amino acids
is the short-, medium- or long-range setting selected for the contact test. Specific details can be found in the TAPE.
Thanks. I'm reporting the one for medium&long range prediction
, just as in the paper and #8.
Hi, may I ask if there are any follow-ups?
I don't know precisely what the error cause is, but we can provide a model of the contact task fine-tuned according to our hyperparameters. And this model was also retrained by us, so there may be 2~4 points of fluctuation, but the difference will not be much.
Sounds good! That would be very helpful!
You can download the checkpoint here
@cheng-siyuan Thanks. So using the model you provided, I got the following output:
'accuracy_l5': 0.6149854063987732, 'accuracy_l2': 0.5102487802505493, 'accuracy_l': 0.4060076177120209
Not sure if I miss sth.
Yes, this checkpoint is obtained later when we have updated the hyperparameters, so that the effect will be 3 to 4 points higher than our result at the beginning. The result in the paper is not the best result yet, and I apologize for the trouble to your experiment.
No problem at all, and you have already been very helpful in replying to the messages :)
So just to double-check, you mean that in Table 1 of your paper, it should be updated to 0.51
(with your checkpoint) for contact
with OntoProtein, right?
Yes, we recommend that you use the results of this checkpoint as a reference.
Sounds good! Appreciate your help and being responsible for your work!
You're welcome:)
Hi there,
Thanks for providing the nice codebase. I'm trying to reproduce the results for downstream tasks, and I have the following questions.
gradient_accumulation_steps
andeval_step
. Can you help clarify this?Any help is appreciated.