Analyze the basic statistics of datasets - 2

matrixdecoded commented 3 years ago

Statistics:

Brief overview of datasets - different parts/subsets/files
Datasets size - size of training, test and dev sets
Different classes of labels and their counts
Sample data (20 records from each data file).
Save all the data files and create dataset on datastop.org

Datasets to review:

Review Sentiment Datasets - Hindi http://www.iitp.ac.in/~ai-nlp-ml/resources.html
IIT Bombay English-Hindi Corpus
HASOC 2019 Dataset : https://hasocfire.github.io/hasoc/2020/dataset.html
(NA) http://amitavadas.com/sentiwordnet.php (send a request to access the data)

himanshu125 commented 3 years ago

I have sent a mail to Amitava Das regarding the SentiWordNet (Indian Languages: Hindi, Bengali, Telugu, and Tamil) Datasets and hasocfire@gmail.com for the key of datasets. In the Review Sentiment Datasets - Hindi IIT Patna datasets are not present.

himanshu125 commented 3 years ago

For opening the HASOC 2019 datasets folder the key is hasoc@2019 and I have downloaded the IIT Bombay English-Hindi Corpus datasets and for this http://amitavadas.com/sentiwordnet.php (send a request to access the data) I did not get any reply from Amitaya Das.

vymana / indic_nlp

Analyze the basic statistics of datasets - 2 #9