Error analysis - Githubissues

aronwc commented 10 years ago

Sort all predictions by error (truth - predicted)
For top 10 errors:
- report the company name
- report top 5 features weighted by (abs(feature value * model coefficient))

cyril94440 commented 10 years ago

I got preliminary results that reveal strange datas :

1 - With an error of 29.9 : @SaraLeeDesserts , Excel sheet : 71.9% Male (Strange, after a quick look on twitter it seems that it's more 71.9% Female) 2 - With an error of 29.1 : @AshworthGolf , Excel sheet : 71.4% Male (It seems correct or maybe even more Male) 3 - With an error of 23.7 : @mitsucars , Excel sheet : 34.4 % Male (Same than the first one, I feel more that it's 34.4 Female)

cyril94440 commented 10 years ago

In the mean time, I don't understand how to do that : report top 5 features weighted by (abs(feature value * model coefficient)) not sure what is feature value VS model coef

aronwc commented 10 years ago

If C is the number of columns (i.e., features / account names), then clf.coef_ is the coefficients of the model (a C-length array).

Each row of Xd represents one company. It is also a C-length array.

By doing the element-wise multiplication of clf.coef_ and the ith row of Xd, we see the largest influence for a prediction.

E.g., coef_ = [5, -2, 6] ; row = [1, 2, 3] ; coef * row = [5, -4, 18]. Element three has the largest impact.

On Thu, Apr 24, 2014 at 9:40 AM, Cyril Trosset notifications@github.comwrote:

In the mean time, I don't understand how to do that : report top 5 features weighted by (abs(feature value * model coefficient)) not sure what is feature value VS model coef

— Reply to this email directly or view it on GitHubhttps://github.com/tapilab/ctrosset/issues/20#issuecomment-41287943 .

aronwc commented 10 years ago

Include results of error analysis in report.

tapilab / ctrosset

Error analysis #20