vlandeiro / sporty-twitters

Project trying to correlate the practice of sports of Twitter users with their well-being.
0 stars 1 forks source link

Implement the well-being classifier #10

Closed vlandeiro closed 10 years ago

vlandeiro commented 10 years ago

https://github.com/virgile11/sporty-twitters/issues?milestone=6

aronwc commented 10 years ago
  1. Sample ~1K tweets containing any of the anxiety/depression keywords
  2. Label as pos/neg.
  3. Train a classifier
  4. Report cross-validation accuracy.
vlandeiro commented 10 years ago

one line in the feature matrix: features for one tweet

should I use stemming? how do I weight the features? tf, tf-idf what kind of features filtering? top infogain words evaluation: F1

list of possible features to discuss:

just an idea: we have access to all the colors defined by the user to display its profile so why not try and use the average luminosity of the profile colors as a feature to see if it helps us to classify depressed users.

aronwc commented 10 years ago

tokenization:

If we have users before/after they started using the app, we can compute engagement features like:

We'll want to scale X matrix

aronwc commented 10 years ago
aronwc commented 10 years ago
vlandeiro commented 10 years ago
vlandeiro commented 10 years ago

Done:

To do:

aronwc commented 10 years ago
vlandeiro commented 10 years ago

Benchmark and CLI modified to:

vlandeiro commented 10 years ago

Done:

vlandeiro commented 10 years ago

When doing a benchmark, I was only returning the first ROC_AUC score. Now, I return the average of all ROC_AUC scores (average over AH, DD, and TA for given parameters.

Run test script improved: