Closed TheSidhesh closed 8 years ago
Lets decide on the amount of tweets ASAP. I think 5k sarcastic and similar for non-sarcastic would be good @smadha @TheSidhesh @RajviM
Sounds good to me!!
Yep. Sounds right. We can ask prof on Monday and change it if necessary
On Thursday, April 14, 2016, Sidhesh Badrinarayan notifications@github.com wrote:
Sounds good to me!!
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_smadha_Team-2DMissionNLP_issues_16-23issuecomment-2D210255765&d=CwMCaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=OqHpa3V9w4cx_nGZ_9TC2Q&m=A-x704gc_WkRFNA0xLw-2stpOCZ35TxCEG0Dx-SPBcY&s=9txFOGeXI3HKM-ipTSLkI8ICkUp84Q1-5icOQ-4NRvs&e=
Don't we already have 10k sarcastic tweets? @TheSidhesh
There are a lot of repetitions in them so effectively there would be much lesser distinct tweets
Add stemming in pre processing
I tested the data on different small sentences taken off the corpus and realised that a few cases need to be considered. We should discuss it when we meet next