Exploiting BERT End-to-End Aspect-Based Sentiment Analysis
./data
rest_total
dataset built by ourselves again, more details can be found in Updated Results.The valid tagging strategies/schemes (i.e., the ways representing text or entity span) in this project are BIEOS (also called BIOES or BMES), BIO (also called IOB2) and OT (also called IO). If you are not familiar with these terms, I strongly recommend you to read the following materials before running the program:
a. Inside–outside–beginning (tagging).
c. The paper associated with this project.
Reproduce the results on Restaurant and Laptop dataset:
# train the model with 5 different seed numbers
python fast_run.py
Train the model on other ABSA dataset:
place data files in the directory ./data/[YOUR_DATASET_NAME]
(please note that you need to re-organize your data files so that it can be directly adapted to this project, following the input format of ./data/laptop14/train.txt
should be OK).
set TASK_NAME
in train.sh
as [YOUR_DATASET_NAME]
.
train the model: sh train.sh
( New feature ) Perform pure inference/direct transfer over test/unseen data using the trained ABSA model:
place data file in the directory ./data/[YOUR_EVAL_DATASET_NAME]
.
set TASK_NAME
in work.sh
as [YOUR_EVAL_DATASET_NAME]
set ABSA_HOME
in work.sh
as [HOME_DIRECTORY_OF_PRETRAINED_ABSA_MODEL]
run: sh work.sh
rest_total
dataset are created by concatenating the train/test counterparts from rest14
, rest15
and rest16
and our motivation is to build a larger training/testing dataset to stabilize the training/faithfully reflect the capability of the ABSA model. However, we recently found that the SemEval organizers directly treat the union set of rest15.train
and rest15.test
as the training set of rest16 (i.e., rest16.train
), and thus, there exists overlap between the rest_total.train
and the rest_total.test
, which makes this dataset invalid. When you follow our works on this E2E-ABSA task, we hope you DO NOT use this rest_total
dataset any more but change to the officially released rest14
, rest15
and rest16
.To facilitate the comparison in the future, we re-run our models following the above mentioned settings and report the results (micro-averaged F1) on rest14
, rest15
and rest16
:
Model | rest14 | rest15 | rest16 |
---|---|---|---|
E2E-ABSA (OURS) | 67.10 | 57.27 | 64.31 |
(He et al., 2019) | 69.54 | 59.18 | n/a |
(Liu et al., 2020) | 68.91 | 58.37 | n/a |
BERT-Linear (OURS) | 72.61 | 60.29 | 69.67 |
BERT-GRU (OURS) | 73.17 | 59.60 | 70.21 |
BERT-SAN (OURS) | 73.68 | 59.90 | 70.51 |
BERT-TFM (OURS) | 73.98 | 60.24 | 70.25 |
BERT-CRF (OURS) | 73.17 | 60.70 | 70.37 |
(Chen and Qian, 2020) | 75.42 | 66.05 | n/a |
(Liang et al., 2020) | 72.60 | 62.37 | n/a |
If the code is used in your research, please star our repo and cite our paper as follows:
@inproceedings{li-etal-2019-exploiting,
title = "Exploiting {BERT} for End-to-End Aspect-based Sentiment Analysis",
author = "Li, Xin and
Bing, Lidong and
Zhang, Wenxuan and
Lam, Wai",
booktitle = "Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019)",
year = "2019",
url = "https://www.aclweb.org/anthology/D19-5505",
pages = "34--41"
}