Can I have a guide on how to add new domain for sentiment?

@coolcorexix

What is the suggested data structure to feed the NLP

The training file would contain multiple lines of text, with each line representing a single document or text sample. Each line would typically be in the following format:

__label__<class_name> <text>

For example

__label__sports I love playing soccer in my free time.
__label__politics The new proposed tax plan is causing quite a stir among citizens.

The label "label" is a prefix that tells the library that the text following it is the class label, and the text following the label is the text to be classified.

How to feed it and get the updated version of NLP

If you want to explore to training pipeline, I suggest you take a look at the training pipeline provided in the GitHub repository https://github.com/undertheseanlp/underthesea/tree/main/examples/sentiment. The current version of the training pipeline is based on GPT-2 model, but we will be expanding it to include FastText and SVM models as well. This will give us more options to choose from and allow us to select the model that best suits our needs.

If you are unable to understand or complete the pipeline on your own, I am available to assist you and help you get it done.

Please let me know if you need any help or have any questions about this.

undertheseanlp / underthesea

Can I have a guide on how to add new domain for sentiment? #631