Pipeline steps - Githubissues

sanyabt commented 5 years ago

Data exploration and processing: statistics, visualization, clean tweets (diff processing options) - Raphael
Setup the random baseline for classification - HuiHui
Setup random forest classifier, with different parameters and metrics in proposal - HuiHui
Code to get features from random forest - Sanya
Comparison with baseline and winning model - how to do. - Sanya
Features: analyze, visualize, find in lexicons - Sanya

7 (possible). Similarity of emotions: find similarity between pairs of emotions, try grouping and running classifier again? Not sure about this.

Any other steps to add hypothesis to problem statement.

JoyceXu02 commented 5 years ago

I can help you on the comparison part.

sanyabt commented 5 years ago

Cool. Metrics: Jaccard index, F1 micro and macro, Precision, Recall ROC-AUC for each label if possible. Accuracy won't work for multi-label I think.

JoyceXu02 commented 5 years ago

I found a mistake in my plotting, and I've corrected it. I've pulled all important features from random forest. ( "important": have non_zero importance value in a tree)

sanyabt / NLP-CS2731

Pipeline steps #1