Open mubaris opened 6 years ago
v2 is a single csv file. I can write a python function to covert that file into the format learn.datasets.load_files expects?
For example, for the following data point:
Corpus,Label,ID,Quote GEN,sarc,GEN_sarc_0000,First off, That's grade A USDA approved Liberalism in a nutshell. GEN,notsarc,GEN_notsarc_1136.First
Programmatically 1) I can create a file GEN_sarc_0000.txt which contains "First off, That's grade A USDA approved Liberalism in a nutshell.". I can create a file GEN_notsarc_1136.txt which contains "First". 2) Then, I can put the file into container/sarc folder and container/notsarc respectively.
This way the current data loading can work as it is.
What do you think about this approach?
v2 Dataset has columns Quote
and Reply
. That's why it's better than v1. If we have both parent comment and reply, I think our bot will have better accuracy.
Do not go down the method you proposed.
It sounds like you are describing a more substantial change. Then what are the steps of achieving what you propose? Since you label this as hacktoberfest, could you provide some more direction?
Can I work on this issue? What exactly are the problems or concerns regarding this issue at the moment?
@bhanu1911
Current Method - We generate features from a single text field to train the models.
The desired Method - v2 Dataset provides 2 text field - question and reply to it. We want to make new models based on these 2 inputs.
Hope this helps
Basically this means we have to start from the ground up - we now have to train a model for the replies too, if I'm not wrong? (I'll study the code and see how you trained the first time around.) Plan of action:
Could you guide me as to how you created the dataset?
@bhanu1911 What I was thinking is little different.
This makes sense because Sarcasm is context based. Having comment and its parent comment will be accurate than a single comment.
I think the source gives enough background about how they created the dataset - Sarcasm v2
I meant how did you partition the dataset?
Sarcasm v2 is a better dataset for this project. Since it has both parent comment and reply. Apply this dataset to make the prediction better.