Some questions about the EOG model in the paper

crystal-xu commented 4 years ago

Hi,

I am also trying to adapt the EOG model to DocRED dataset. For me, I converted the DocRED data format, set batch_size to 1, epoch number to 200, and learning rate to 0.01. I just used the PubText word embedding in the original model. However, when the training process of 200 epochs is over, the result on the dev set is as follows: TEST | LOSS = 0.20529, ACC = 0.9715 , MICRO P/R/F1 = 0.6678 0.0356 0.0676 | TP/ACTUAL/PRED = 410 /11518 /614 , TOTAL 396866 | 0h 00m 28s

It seems that the recall is very low. I guess maybe the root cause is the small batch_size (SGD in this case). However, setting the batch size to 2 will lead to the "CUDA out of memory" issue as the original model only supports the single-GPU scenario.

I am considering adapting the original model to the multi-GPU scenario, but not sure whether it could work. Would you mind telling me if you have made some special modifications to the original model? Or, did you just tune the hyperparameters to the proper values or use the different pre-trained word embeddings, e.g. Glove?

Thanks very much.

nanguoshun commented 4 years ago

Hi @crystal-xu, it also takes us a lot of time to understand the code of EoG and we just refer to the data preprocessing part. I suggest you ask Fenia for the details of EoG model. For the "CUDA out of memory" issue, you may use gradient accumulation if you want to increase the batch size on a small GPU.

We tried both Pubmed embedding and Glove embedding and got very similar F1 scores on CDR.

crystal-xu commented 4 years ago

Hi @nanguoshun , thanks for your replay.

What I intend to do is try the EoG on DocRED directly. I notice that in your paper you achieved about 0.52 F1 score for the EOG model. So, you implement the same model by yourself as your baseline instead of just tuning the hyperparameters? I have achieved about a 0.47 F1 score up to now. So, I am wondering how you could achieve such a nice score.

nanguoshun commented 4 years ago

Hi @crystal-xu. YES, we adapt the EoG model to the DocRED dataset. we may release the code in the future and it will take some time to clean the code.

nanguoshun / LSR

Some questions about the EOG model in the paper #7