How to get targets from sentences?

yangheng95 / PyABSA

Sentiment Analysis, Text Classification, Text Augmentation, Text Adversarial defense, etc.;

https://pyabsa.readthedocs.io

MIT License

955 stars 161 forks source link

How to get targets from sentences? #172

Closed umarbeknasimov closed 2 years ago

umarbeknasimov commented 2 years ago

Hi,

I am using the "eberta-v3-base-absa-v1.1" model/tokenizer from huggingface in order to run the model on my data but my data only has sentences, it doesn't have targets specified. I see that the example shown already inputs "manager" as the target but does your model also allow automatic extraction of the targets from sentences? If not, can you point me to implementations/apis for target extraction?

Thank you!

pepi99 commented 2 years ago

There is this library you can use: https://github.com/yangheng95/ABSADatasets (it is made by the author of the PyABSA). You can label your own data and give sentiment to each aspect that you want.

If you don't want to do it yourself, you can use some entity recognition software, spacy has such functionality as far as I remember.

But for the format that PyABSA accepts, you should manually label sentences by using ABSADatasets.

pepi99 commented 2 years ago

I can help you do it, just let me know.

umarbeknasimov commented 2 years ago

I looked into spacy's entity recognition but it seems to only pick up named entities (not just regular entities). For reference, I looked into spacy Token's entiob feature (which does not pick up regular entities). I want something that does this:

input: "the tech support is bad but the battery life is good" output: ["tech support", "battery life"]

Is there any entity recognition tool I can use?

The api (https://huggingface.co/spaces/yangheng/PyABSA-ATEPC) seems to do this. I wonder what tool is used on the backend to get the targets.

--- edit

I was able to use the term extractor from the demo file: https://github.com/yangheng95/PyABSA/blob/release/demos/aspect_term_extraction/extract_aspects.py.