undertheseanlp / underthesea

Underthesea - Vietnamese NLP Toolkit
http://undertheseanlp.com
GNU General Public License v3.0
1.37k stars 271 forks source link

Can I have a guide on how to add new domain for sentiment? #631

Closed coolcorexix closed 1 year ago

coolcorexix commented 1 year ago

First of all, huge thanks for your work.

Currently I am trying to use Underthesea to figure out the attitude of review on Shopee but the lib just give pretty weak result image

So what can I do to improve it, I try reading the contributing.rst but I am not able to find:

Can you help me answer that?

rain1024 commented 1 year ago

@coolcorexix

What is the suggested data structure to feed the NLP

The training file would contain multiple lines of text, with each line representing a single document or text sample. Each line would typically be in the following format:

__label__<class_name> <text>

For example

__label__sports I love playing soccer in my free time.
__label__politics The new proposed tax plan is causing quite a stir among citizens.

The label "label" is a prefix that tells the library that the text following it is the class label, and the text following the label is the text to be classified.

How to feed it and get the updated version of NLP

If you want to explore to training pipeline, I suggest you take a look at the training pipeline provided in the GitHub repository https://github.com/undertheseanlp/underthesea/tree/main/examples/sentiment. The current version of the training pipeline is based on GPT-2 model, but we will be expanding it to include FastText and SVM models as well. This will give us more options to choose from and allow us to select the model that best suits our needs.

If you are unable to understand or complete the pipeline on your own, I am available to assist you and help you get it done.

Please let me know if you need any help or have any questions about this.