Closed vlandeiro closed 10 years ago
one line in the feature matrix: features for one tweet
should I use stemming? how do I weight the features? tf, tf-idf what kind of features filtering? top infogain words evaluation: F1
list of possible features to discuss:
just an idea: we have access to all the colors defined by the user to display its profile so why not try and use the average luminosity of the profile colors as a feature to see if it helps us to classify depressed users.
tokenization:
If we have users before/after they started using the app, we can compute engagement features like:
We'll want to scale X matrix
Done:
To do:
Benchmark and CLI modified to:
Done:
When doing a benchmark, I was only returning the first ROC_AUC score. Now, I return the average of all ROC_AUC scores (average over AH, DD, and TA for given parameters.
Run test script improved:
StatsTree
to make it easily changeableStatsNode
for one parameter of the command lineNone
if there is no following node.StatsTree.traverse(func)
:
func
parameter is a function taking one parameter that is executed in each leaf of the tree.cmd
passed to the function func
is the arg vector built at this moment.
https://github.com/virgile11/sporty-twitters/issues?milestone=6