Open WEIYanbin1999 opened 1 year ago
Maybe it is because the hyperparameters are not the best ones. Could you please release the detail hyperparameters? Thanks again
Paste my reproduce workflow: Begin with your GitHub repo, I run the pretrain command in A100 python run_pretrain.py --pretrain_dataset BIG --dataset_name BIG --num_hidden_layers 4 --train_bs 16 --lr 1e-4 --epochs 10 some logs for pretrian:
Then I use your "get_pretrained_KGTransformer_parameters.py" to get a file with suffix "ep9_delWE", then run the downstream triple classification python run_down_triplecls.py --dataset_name WN18RR --pretrain_dataset BIG --down_task down_triplecls --train_bs 16 --test_bs 128 --epochs 50 --fixedT 1
Some logs for triple cls: 05-17 16:04 INFO KGModel.encoder.embeddings.word_embeddings.weight.requires_grad == True 05-17 16:04 INFO load pretrained parameters from pretrain_models/BIG/model_layer-4_hidden-768_heads-12_seq-126_textE-cls_t0-1.0_t1-1.0_t2-1.0.ep9_delWE. 05-17 16:04 INFO freeze parameters of encoder.encoder.layer. 05-17 16:04 INFO Creating BERT Trainer 05-17 16:04 INFO Total Parameters:104389647 05-17 16:04 INFO Total transformer Parameters:28351488
However, after many epochs, acc is poorer than expected, precision is higher but recall is very low, while we expect recall is high by the description in your paper.
Could you please give me some advice, thank you and have a good day!
Besides, I note there are some mismatch in the command and paper,
I am looking forward to your guide sincerely. Very thank to your patience and open academic sharing.
Finally, could you please give the origin-version pretrain model to me. I'd like to initiate a new work, it is like a plugin attend to other KG model, and want to use your model as a preliminary model~
Last week, I retry the pretrain process and change the batchsize as 4 as the paper state(git repo use batch_size 16), the performance of my reproduce experiments: previous f1=0.81 bs=16 current f1=0.83 bs=4
Though f1 improve 2%, I note recall is higher than precision and accuracy, that's meet the paper. However, it is not always good, when I rerun the pretrain, the result may be different(That't ok, because parameters are initialized randomly)
However, I note the tuning is time costly and my work is not concentrate on pretrain model, to me, it is a preliminary model can be directly used, so could you share or send me your pretrain model parameters, better with the downstreaming model paras. Thanks a lot.
上周,我重试了预训练过程,并将 batchsize 更改为 4 作为论文状态(git repo use batch_size 16),我的重现实验的性能:以前的 f1=0.81 bs=16 当前 f1=0.83 bs=4
虽然 f1 提高了 2%,但我注意到回忆率高于精度和准确率,这是符合论文的要求。但是,它并不总是好的,当我重新运行预训练时,结果可能会有所不同(那不行,因为参数是随机初始化的)
但是,我注意到调优很耗时,而且我的工作并不集中在预训练模型上,对我来说,它是一个可以直接使用的初步模型,所以你能不能分享或发给你你的预训练模型参数,最好是下游模型的参数。多谢。
Hello, I am also reproducing this work but the reproduction is not as expected, have you made any subsequent adjustments to the reproduction parameters for this content and what is the final reproduction? I would be very grateful for your answer!
I run the code following the instruction, pretrain with 2-hop, 10 epoch and choose the best one pretrain model,
then for triple classification 50 epoch all follow the instruction but got f1 0.81 while the paper report 0.89,
I am trying to find some advice, the hyperparameter use the default ones, I did not do any modification, could you give me some insights? Thanks a lot