sanyabt / NLP-CS2731

NLP Course Project
0 stars 0 forks source link

Pipeline steps #1

Open sanyabt opened 5 years ago

sanyabt commented 5 years ago
  1. Data exploration and processing: statistics, visualization, clean tweets (diff processing options) - Raphael
  2. Setup the random baseline for classification - HuiHui
  3. Setup random forest classifier, with different parameters and metrics in proposal - HuiHui
  4. Code to get features from random forest - Sanya
  5. Comparison with baseline and winning model - how to do. - Sanya
  6. Features: analyze, visualize, find in lexicons - Sanya

7 (possible). Similarity of emotions: find similarity between pairs of emotions, try grouping and running classifier again? Not sure about this.

  1. Any other steps to add hypothesis to problem statement.
JoyceXu02 commented 5 years ago

I can help you on the comparison part.

sanyabt commented 5 years ago

Cool. Metrics: Jaccard index, F1 micro and macro, Precision, Recall ROC-AUC for each label if possible. Accuracy won't work for multi-label I think.

JoyceXu02 commented 5 years ago

I found a mistake in my plotting, and I've corrected it. I've pulled all important features from random forest. ( "important": have non_zero importance value in a tree)