smadha / SarcasmDetector

CSCI-544 Final Project
Apache License 2.0
9 stars 6 forks source link

Sarcasm Detector in Hindi

CSCI-544 Final Project by Team 4 - Team MissionNLP

Introduction

Sarcasm is defined as the use of words that mean the opposite of what you really want to say especially in order to insult someone, to show irritation, or to be funny. The inherently ambiguous nature of sarcasm sometimes makes it tough even for humans to decide whether a text is sarcastic or not. Detection of sarcasm can benefit many sentiment analysis NLP applications, such as review summarization, dialogue systems, opinion mining and review ranking systems. In this project, we define our problem precisely as follows: We formulate sarcasm detection as a classification task. Given a text, the goal is to predict whether it is sarcastic or not. Twitter as a micro-blogging platform offers a diverse range of sarcastic and non-sarcastic tweets. These tweets are available in multiple domains like politics, sports, environment, regional etc.

Cross Language Text Classification:

We have trained our classifier on tweets available in Hindi and then test it on both Hindi and English tweets and evaluate performance with comments on aspects of language conversion. The biggest challenge of this research paper lies in the feature engineering of the problem. We wish to exploit different language features along with contextualized twitter features to train classifiers. Existing work in the field emphasizes on using NB and SVM for classification using various features formulations. None of the previous work has been done in Hindi language or Cross Language Learning. Our aim is to achieve both. We believe that such a project will help improve the accuracy of sentiment analyses across different languages.

Contributors

  1. Madhav Sharan
  2. [Swanand Joshi] (https://github.com/swanandj7)
  3. [Sidhesh Badrinarayan] (https://github.com/TheSidhesh)
  4. Rajvi Mehta