Closed lshowway closed 2 years ago
I think that your result is within a reasonable range of performance fluctuation.
Although you changed max_seq_length
from 512 to 256, that does not make a large difference with these datasets, where typical input sequence lengths are not so long.
Note that the paper reports the best performance of the model. So if you take the average of the scores, it would be definitely lower than the highest score.
Thanks for your solid work.
Are the results reported in the paper based on allennlp or huggingface?
I finetune LUKE on
OpenEntity
andTACRED
dataset, with commands and code in legacy, but reduce themax_seq_length
from 512 to 256 to avoid OOM. I repeat it more than five times with different seeds, and I got the average results of77.6
and71.7
, respectively, while the reported results are78.2
and72.7
.Is this normal?