Closed coolcorexix closed 1 year ago
@coolcorexix
What is the suggested data structure to feed the NLP
The training file would contain multiple lines of text, with each line representing a single document or text sample. Each line would typically be in the following format:
__label__<class_name> <text>
For example
__label__sports I love playing soccer in my free time.
__label__politics The new proposed tax plan is causing quite a stir among citizens.
The label "label" is a prefix that tells the library that the text following it is the class label, and the text following the label is the text to be classified.
How to feed it and get the updated version of NLP
If you want to explore to training pipeline, I suggest you take a look at the training pipeline provided in the GitHub repository https://github.com/undertheseanlp/underthesea/tree/main/examples/sentiment. The current version of the training pipeline is based on GPT-2 model, but we will be expanding it to include FastText and SVM models as well. This will give us more options to choose from and allow us to select the model that best suits our needs.
If you are unable to understand or complete the pipeline on your own, I am available to assist you and help you get it done.
Please let me know if you need any help or have any questions about this.
First of all, huge thanks for your work.
Currently I am trying to use Underthesea to figure out the attitude of review on Shopee but the lib just give pretty weak result
So what can I do to improve it, I try reading the contributing.rst but I am not able to find:
Can you help me answer that?