About the results - Githubissues

wireless911 / span-aste

a sample pytorch Implementation of ACL 2021 research paper "Learning Span-Level Interactions for Aspect Sentiment Triplet Extraction".

Apache License 2.0

38 stars 2 forks source link

About the results #2

Closed EvanSong77 closed 2 years ago

EvanSong77 commented 2 years ago

For this dataset 15res, the bilstm version(your code) i got the best f1 in the eval is 0.6382,the paper is 0.6426, and the bert version i got the best f1 in the eval is 0.6564, the paper is 0.7075. I can't get close to the results of the paper, can you give me some suggestions? Thank you very much!

wireless911 commented 2 years ago

The result index you get in the bert version is slightly lower, my parameter selection is batchsize=4, lr=3e-5, you can try to modify your hyperparameters to see the result.

EvanSong77 commented 2 years ago

Thank you! now, the dataset 15res can reach 0.695(best relation F1) in the eval, can you reappear the results of this paper in the dataset 15res?

wireless911 commented 2 years ago

After the bert version I implemented was trained in the 15res dataset, the f1 value of the validation set was 69.1 and the result(70.75) in the paper was still slightly different. I am still exploring.

Zyuting1 commented 2 years ago

hi, i tried bilstm on the 14res.txt, but i got a low score like 0.527(relation F1). i have tried to adjust the parameters but didnt work. the parameters now are batchsize 8 and lr 3e-4. have u tried other datasets expect 15res.txt to get a score thats close to the score in paper? looking forward to ur reply! thank u so much!

Resist4263 commented 2 years ago

I think there may be a problem with the F1 calculation.Because the F1 of the whole dataset is not calculated by being averaged from the F1 of each batch.

EvanSong77 commented 2 years ago

我认为F1的计算可能存在问题。因为整个数据集的 F1 不是通过从每个批次的 F1 中取平均值来计算的。

yeah, you are right, he has a problem with metrics

wireless911 commented 2 years ago

正如你们提到的问题，我在最新的版本进行了修复，同时更新了bert 版本，感谢你们提出的宝贵意见