results on NYT11-HRL is lower than papers

Carl-Ro commented 2 years ago

All I did is download the bert_uncased, modify the path, run ### python extraction.py NYT11-HRL train . same as @hqp-hub, when I reproduced the model on NYT11-HRL without modify any param, I got a much lower F1 ( 0.3973 ) than scores in paper. Is there any preprocess or steps I miss ? If it's convenient, could you please give me the specific param of your previous experiment? Cuz it's seems like took me forever to try the param. Here's the scores i got: Prec: 0.5043 235/466, Reca: 0.6369 235/369, F1: 0.5629 for RC model Prec: 0.3947 148/375, Reca: 0.4000 148/370, F1: 0.3973 for EE model

redreamality commented 2 years ago

There's no need to change the default parameters, which will take you to the result in the paper. I am not able to find out the problem you met based on the current infomation provided, you may need to provide:

detailed environments
the training process, results printed from every epoch.
have you try other datasets? what are the results?
any other infomation.

Carl-Ro commented 2 years ago

Here's my env: GPU: GeForce RTX 3090 NVIDIA-SMI 455.23.05
Driver Version: 455.23.05
CUDA Version: 11.1 python 3.7.9 tensorboard 2.9.0
tensorboard-data-server 0.6.1
tensorboard-plugin-wit 1.8.1
tensorflow 2.5.0
tensorflow-estimator 2.5.0
tensorflow-gpu 2.3.0

Here's some param : max_len = 128 batch_size = 32 thre_rc: = 0.56 thre_ee = 0.55 LOSS = 'BCE'

All the param involved network is not modified. And I'm running and recording in envs and param given above.If it gives same lower result, I'll run model in NYT10-HRL . Thanks for your reply , have a nice day : )

redreamality commented 2 years ago

thre_rc=0.2 thre_ee=0.4

have you modified them?

Carl-Ro commented 2 years ago

Here's my train log:

RC_model: Epoch 1/20 1929/1929 [==============================] - 290s 134ms/step - loss: 0.2292 - accuracy: 0.4576 - val_loss: 0.1742 - val_accuracy: 0.5865 Epoch 2/20 1929/1929 [==============================] - 255s 132ms/step - loss: 0.1511 - accuracy: 0.6313 - val_loss: 0.1341 - val_accuracy: 0.6571 Epoch 3/20 1929/1929 [==============================] - 255s 132ms/step - loss: 0.1199 - accuracy: 0.6934 - val_loss: 0.1142 - val_accuracy: 0.6939 Epoch 4/20 1929/1929 [==============================] - 257s 133ms/step - loss: 0.0605 - accuracy: 0.8412 - val_loss: 0.0980 - val_accuracy: 0.7740 Epoch 5/20 1929/1929 [==============================] - 256s 133ms/step - loss: 0.0910 - accuracy: 0.7654 - val_loss: 0.1034 - val_accuracy: 0.7324 Epoch 6/20 1929/1929 [==============================] - 256s 133ms/step - loss: 0.0818 - accuracy: 0.7903 - val_loss: 0.0997 - val_accuracy: 0.7612 Epoch 7/20 1929/1929 [==============================] - 256s 133ms/step - loss: 0.0734 - accuracy: 0.8100 - val_loss: 0.1049 - val_accuracy: 0.7356 Epoch 8/20 1929/1929 [==============================] - 255s 132ms/step - loss: 0.0665 - accuracy: 0.8262 - val_loss: 0.1020 - val_accuracy: 0.7564 Epoch 9/20 1929/1929 [==============================] - 257s 133ms/step - loss: 0.0605 - accuracy: 0.8412 - val_loss: 0.0980 - val_accuracy: 0.7740 Epoch 10/20 1929/1929 [==============================] - 256s 133ms/step - loss: 0.0548 - accuracy: 0.8554 - val_loss: 0.1119 - val_accuracy: 0.7596 Epoch 11/20 1929/1929 [==============================] - 256s 133ms/step - loss: 0.0495 - accuracy: 0.8663 - val_loss: 0.1081 - val_accuracy: 0.7788 Epoch 12/20 1929/1929 [==============================] - 255s 132ms/step - loss: 0.0447 - accuracy: 0.8769 - val_loss: 0.1110 - val_accuracy: 0.7740 Epoch 13/20 1929/1929 [==============================] - 257s 133ms/step - loss: 0.0404 - accuracy: 0.8879 - val_loss: 0.1090 - val_accuracy: 0.7708 Epoch 14/20 1929/1929 [==============================] - 256s 133ms/step - loss: 0.0361 - accuracy: 0.9006 - val_loss: 0.1150 - val_accuracy: 0.7869 Epoch 15/20 1929/1929 [==============================] - 256s 133ms/step - loss: 0.0326 - accuracy: 0.9061 - val_loss: 0.1185 - val_accuracy: 0.7756 Epoch 16/20 1929/1929 [==============================] - 257s 133ms/step - loss: 0.0290 - accuracy: 0.9162 - val_loss: 0.1188 - val_accuracy: 0.7821 Epoch 17/20 1929/1929 [==============================] - 256s 133ms/step - loss: 0.0261 - accuracy: 0.9213 - val_loss: 0.1213 - val_accuracy: 0.7885 Epoch 18/20 1929/1929 [==============================] - 256s 133ms/step - loss: 0.0235 - accuracy: 0.9274 - val_loss: 0.1258 - val_accuracy: 0.7853 Epoch 19/20 1929/1929 [==============================] - 256s 133ms/step - loss: 0.0211 - accuracy: 0.9338 - val_loss: 0.1271 - val_accuracy: 0.7869 Epoch 20/20 1929/1929 [==============================] - 256s 133ms/step - loss: 0.0196 - accuracy: 0.9358 - val_loss: 0.1318 - val_accuracy: 0.7933

EE_model: Epoch 1/20 2230/2230 [==============================] - 509s 203ms/step - loss: 0.0312 - accuracy: 0.2497 - val_loss: 0.0209 - val_accuracy: 0.3505 Epoch 2/20 2230/2230 [==============================] - 454s 204ms/step - loss: 0.0161 - accuracy: 0.2813 - val_loss: 0.0113 - val_accuracy: 0.3562 Epoch 3/20 2230/2230 [==============================] - 452s 203ms/step - loss: 0.0105 - accuracy: 0.2846 - val_loss: 0.0084 - val_accuracy: 0.2211 Epoch 4/20 2230/2230 [==============================] - 451s 202ms/step - loss: 0.0079 - accuracy: 0.2808 - val_loss: 0.0072 - val_accuracy: 0.3002 Epoch 5/20 2230/2230 [==============================] - 449s 201ms/step - loss: 0.0063 - accuracy: 0.3060 - val_loss: 0.0063 - val_accuracy: 0.4009 Epoch 6/20 2230/2230 [==============================] - 452s 203ms/step - loss: 0.0051 - accuracy: 0.3110 - val_loss: 0.0057 - val_accuracy: 0.2885 Epoch 7/20 2230/2230 [==============================] - 450s 202ms/step - loss: 0.0041 - accuracy: 0.3396 - val_loss: 0.0053 - val_accuracy: 0.3372 Epoch 8/20 2230/2230 [==============================] - 448s 201ms/step - loss: 0.0033 - accuracy: 0.3699 - val_loss: 0.0053 - val_accuracy: 0.4162 Epoch 9/20 2230/2230 [==============================] - 450s 202ms/step - loss: 0.0027 - accuracy: 0.4164 - val_loss: 0.0053 - val_accuracy: 0.3940 Epoch 10/20 2230/2230 [==============================] - 449s 201ms/step - loss: 0.0021 - accuracy: 0.3739 - val_loss: 0.0052 - val_accuracy: 0.3244 Epoch 11/20 2230/2230 [==============================] - 452s 203ms/step - loss: 0.0017 - accuracy: 0.3881 - val_loss: 0.0058 - val_accuracy: 0.4618 Epoch 12/20 2230/2230 [==============================] - 446s 200ms/step - loss: 0.0011 - accuracy: 0.4248 - val_loss: 0.0060 - val_accuracy: 0.3428 Epoch 14/20 2230/2230 [==============================] - 447s 200ms/step - loss: 8.2989e-04 - accuracy: 0.4378 - val_loss: 0.0065 - val_accuracy: 0.5074 Epoch 15/20 2230/2230 [==============================] - 449s 201ms/step - loss: 6.7782e-04 - accuracy: 0.4465 - val_loss: 0.0066 - val_accuracy: 0.5091 Epoch 16/20 2230/2230 [==============================] - 450s 202ms/step - loss: 5.3422e-04 - accuracy: 0.4665 - val_loss: 0.0069 - val_accuracy: 0.4934 Epoch 17/20 2230/2230 [==============================] - 442s 198ms/step - loss: 4.2993e-04 - accuracy: 0.4966 - val_loss: 0.0070 - val_accuracy: 0.4978 Epoch 18/20 2230/2230 [==============================] - 415s 186ms/step - loss: 3.4576e-04 - accuracy: 0.4905 - val_loss: 0.0072 - val_accuracy: 0.4653 Epoch 19/20 2230/2230 [==============================] - 414s 186ms/step - loss: 3.1074e-04 - accuracy: 0.4903 - val_loss: 0.0076 - val_accuracy: 0.4660 Epoch 20/20 2230/2230 [==============================] - 390s 175ms/step - loss: 2.9238e-04 - accuracy: 0.5042 - val_loss: 0.0077 - val_accuracy: 0.5361

test in{ thre_rc=0.2 thre_ee=0.4}: RC_model: Prec: 0.5011 237/473, Reca: 0.6423 237/369, F1: 0.5629 EE_model: Prec: 0.4138 144/348, Reca: 0.3892 144/370, F1: 0.4011

test in{ thre_rc=0.56 thre_ee=0.55}: RC_model: Prec: 0.5383 225/418, Reca: 0.6098 225/369, F1: 0.5718 EE_model: Prec: 0.4399 139/316, Reca: 0.3757 139/370, F1: 0.4052

It seems that model has a continuous impove on train & valid set, but eventually got a quite low score on test set. Numbers of sentences is: train: 62335, val: 313, test: 369, and I've checeked the dataset in https://github.com/redreamality/-RERE-data .There's no way that data went wrong.

Carl-Ro commented 2 years ago

thre_rc=0.2 thre_ee=0.4

have you modified them?

yes, I modified them because at { thre_rc=0.2 thre_ee=0.4} model doesn't work well. So I tried param in config.py

Carl-Ro commented 2 years ago

I noticed that , train method in Class EEModel used binary crossentropy as loss function, instead of the positive unlabeled learning loss that metioned in paper:

self.model.compile(self.optimizer, 'binary_crossentropy', metrics=['accuracy'])

So maybe the code is not the final version of the paper, so it doesn't use the loss function in paper? Or BCE is actually better?

redreamality commented 2 years ago

RC_model: Prec: 0.5383 225/418, Reca: 0.6098 225/369, F1: 0.5718 EE_model: Prec: 0.4399 139/316, Reca: 0.3757 139/370, F1: 0.4052

It looks like some problem with pre-trained model. Does other dataset have the problem or just NYT11? I can reproduce every result with trained model (I can provide the pre-trained models via netdisk through email).

Here's some param : max_len = 128 batch_size = 32 thre_rc: = 0.56 thre_ee = 0.55 LOSS = 'BCE'

LOSS = 'CPU' is used in our final paper version for RC, but BCE should also work fine.

I noticed that , train method in Class EEModel used binary crossentropy as loss function, instead of the positive unlabeled learning loss that metioned in paper:

self.model.compile(self.optimizer, 'binary_crossentropy', metrics=['accuracy'])

So maybe the code is not the final version of the paper, so it doesn't use the loss function in paper? Or BCE is actually better?

The CPU loss mentioned in the paper is a framework. If \mu_ee = 0, it can be approximately degenerated into a standard loss function, so it is feasible to replace it with cross entropy when the false negative problem is not serious.

Our training experiments still use the CPU, but we compare the BCE and CPU results of EE, and they are similar (in the same case, RC uses CPU ablation to show that there are about 0.3-2 point improvement), so the cross entropy that does not require parameter adjustment is selected for open source.

Carl-Ro commented 2 years ago

I just test on NYT10-HRL: RC_model: Prec: 0.8012 3985/4974, Reca: 0.7082 3985/5627, F1: 0.7518 EE_model: Prec: 0.6927 3357/4846, Reca: 0.5730 3357/5859, F1: 0.6272

Results also much lower than in paper(about 0.13 lower in F1). So it's likely that something wrong with pre-trained model.Could you please send me the pre-trained model in email? Either folder or .h5 file is OK.

carlro7777@gmail.com Thank you : )

hqp-hub commented 2 years ago

Could you please send one to me as well ? My email is 1737598043@qq.com . Thanks a lot.

hqp-hub commented 2 years ago

Hi , could you please check your email ? It's been 3 working days since I reply your email

redreamality commented 2 years ago

All requests by now are proceeded.
BTW, the author is not responsible for other personal requests, such as "please send me the original bert model", "please help me debug my code", etc. All requests of this kind will not get a reply.

redreamality / RERE-relation-extraction

results on NYT11-HRL is lower than papers #5