kevinscaria / InstructABSA

Instructional learning for Aspect Based Sentiment Analysis [NAACL-2024]
https://aclanthology.org/2024.naacl-short.63/
MIT License
147 stars 24 forks source link

Limiting labels #10

Closed KatherineKing closed 1 year ago

KatherineKing commented 1 year ago

This package is great, thanks!

  1. For production, it would be desirable to limit the number of topic-sentiment pairs which can be output for each sample.
  2. The length of the output string is too short and cannot handle long topic labels well.
  3. It would be desirable to limit the total number of topics that can be output to avoid having the model generate rare labels which complicate post-processing.
  4. There could be a setting to enforce that only labels in the training data could be assigned.
kevinscaria commented 1 year ago

Hi Katherine, I am glad you liked our work.

  1. Since this is a generative model, there is no reliable way to terminate the generation without affecting generation quality. What you can do is probable post process the outputs based on your business/research use case.
  2. I have added a max_token_lengthcli argument that can be leveraged to increase the output generation length. Alternatively for your custom dataset you can try increasing the output labels length from 64 to 512 (at the cost of speed). 64 was set based on the analysis conducted for SemEval datasets and 64 tokens was good enough length for outputs.
  3. This is not a topic modelling approach so there is no reliable way to limit the topics. again post-processing is the way to go.
  4. Seems like a good idea, let me see what I can do. But for now post-processing is the best way to go.

I am busy with other research work, so not able to maintain this repository and requests.

Best, KJS