Open yassmine-lam opened 3 years ago
In this repo, we propose to handle the E2E-ABSA problem using a sequence tagging model. Since ATE can be formulated as a sequence tagging task, you can evaluate the ATE performance by simply degrading the predicted tags and the gold standard tags of the E2E-ABSA task. However, as ASC is a typical classification task, evaluating its performance in our sequence tagging model is not that straightforward.
Thank u for ur reply. Yes, you are right, but I was wondering how did u do to report two results, one for ATE and the other for ASC in this repo https://github.com/lixin4ever/E2E-TBSA). As I understood, u separated the tags of each task and evaluated their results separately? Is that true? If yes, it is possible to do the same here, isn't it?
Sorry, I feel confused, so If u could explain this to me, I will be grateful.
Thank u
First of all, I want to clarify that ASC, a typical classification task, is different from E2E-ABSA (or "targeted sentiment analysis", "E2E-TBSA"), which is formulated as a sequence tagging task in our paper. In this issue, I mistakenly told you that "targeted sentiment analysis" is equivalent to "aspect sentiment classification" and I think that's the point leading to part of your confusion (sorry for the misinformation).
Return to your question, in another work, namely https://github.com/lixin4ever/E2E-TBSA, the reason we can report the results of ATE and E2E-ABSA is that it is a multi-task learning framework, and ATE predictions are explicitly provided. In order to report the ATE performance in this repo, you may need to degrade the predicted/gold tags of E2E-ABSA, i.e., only preserve the boundary tag and ignore the sentiment tag, and then do evaluation.
Thank u very much for the detailed answer; it is more clear to me now. I really appreciate your effort and time in answering our questions. Another question plz. For a sequence labeling model, people are generally using seqevel for evaluation, but in ur code, u used sklearn metrics; what is the difference between these two frameworks? And what is the most suitable one for this task (E2E-ABSA)?
Thank u in advance
Hi,
Thank u for sharing ur code with us. I have a question about predictions. As I read in ur paper, u reported a single result for these two tasks. However, is it possible to return evaluation scores for each task separately (As u did in this code https://github.com/lixin4ever/E2E-TBSA)? So we can compare this model against single-task models?
Thank u