Question about running the code

xwjim commented 4 years ago

when I run the code, a problem occurred. Do you have any idea about this problem

environment: pytorch 1.5.1 cuda 10.1

Snipaste_2020-07-07_10-31-37

nanguoshun commented 4 years ago

Hi @xwjim , thanks for your feedback. It may be caused by the dimension mismatch ( most probably on multiple GPUs). You may temporarily avoid this issue by using a different batch size. I will fix it in future.

xwjim commented 4 years ago

Hi @xwjim , thanks for your feedback. It may be caused by the dimension mismatch ( most probably on multiple GPUs). You may temporarily avoid this issue by using a different batch size. I will fix it in future.

Thank you for your reply. Is there a version of bert in the code

nanguoshun commented 4 years ago

@xwjim we will try to release the BERT version in August.

xwjim commented 4 years ago

@xwjim we will try to release the BERT version in August.

Thank you for sharing. May I ask if you fix the bert model's parameter or fine-tune it when training.

longlongman commented 4 years ago

@xwjim Hi, I wonder whether you can tell me the batch size you used to avoid this issue and the final F1 score in Dev by running this code.

xwjim commented 4 years ago

@xwjim Hi, I wonder whether you can tell me the batch size you used to avoid this issue and the final F1 score in Dev by running this code.

I use the CUDA_VISIBLE_DEVICES=0 ahead of the training code to avoid using multiple gpus. Could you tell me your result when you get some result.

longlongman commented 4 years ago

@xwjim thanks lot, I set the batch size to 64 to avoid the issue but the result is quite poor. The F1 score in DEV is only 0.507 even worse than the DocRED Bi-LSTM baseline.

xwjim commented 4 years ago

@xwjim thanks lot, I set the batch size to 64 to avoid the issue but the result is quite poor. The F1 score in DEV is only 0.507 even worse than the DocRED Bi-LSTM baseline.

I set the batch size to 10 and process it in a single gpu. I got about 0.545.

nanguoshun commented 4 years ago

hi @longlongman @xwjim , here is a piece of log for your reference. I can achieve 55.32 F1 in the 88th epoch with the default configuration.

nanguoshun commented 4 years ago

hi @xwjim, we trained the BERT+LSR directly for the reported best results. We also tried to train the BERT and LSR separately by freezing one of them at the beginning with certain epochs. Finetuning BERT is very tricky and you may try different approaches to achieve the best performance.

xwjim commented 4 years ago

hi @xwjim, we trained the BERT+LSR directly for the reported best results. We also tried to train the BERT and LSR separately by freezing one of them at the beginning with certain epochs. Finetuning BERT is very tricky and you may try different approaches to achieve the best performance.

Thank you for your reply. This is a good job and I wonder if I can choose to train the model in inter and intra case in the code.

xwjim commented 3 years ago

hi @xwjim, we trained the BERT+LSR directly for the reported best results. We also tried to train the BERT and LSR separately by freezing one of them at the beginning with certain epochs. Finetuning BERT is very tricky and you may try different approaches to achieve the best performance.

I have tried to train the bert model. I first freeze the bert by 20 epoch. And then freeze the lsr model by 20 epoch. And then train the bert and lsr at the same time. However I get a lower F1 than 59. I wonder if I have miss something

nanguoshun commented 3 years ago

hi @xwjim have you tried to train the BERT+LSR model simultaneously?

xwjim commented 3 years ago

hi @xwjim have you tried to train the BERT+LSR model simultaneously?

Thank you very much for you apply. I first freeze the bert by 20 epoch. And then freeze the lsr model by 20 epoch. And then I train bert+lsr model simultaneously for about 20 epoch. The f1 did not change much so I did not train for long epoch. Should I train for longer or I miss some steps

xwjim commented 3 years ago

hi @xwjim have you tried to train the BERT+LSR model simultaneously?

nanguoshun / LSR

Question about running the code #9