open-connectome-classes / StatConn-Spring-2015-Info

introductory material
18 stars 4 forks source link

Space-time wiring gamification statistics #190

Open imichaelnorris opened 9 years ago

imichaelnorris commented 9 years ago

I would have likes to see some statistics of how accurately a single classification is made. Does anyone know of any papers related to crowd sourcing statistical methods?

The "GrimReaper" is allowed to override consensus if it's wrong. it would have been nice to know how frequently this occurred so someone considering a similar study could gauge how much involvement is needed to ensure the players are accurately classifying things.

mrjiaruiwang commented 9 years ago

Yea, I met a professor at the Scripps Research Institute in La Jolla, CA named Andrew Su who loves this kind of stuff with the big data and the crowdsourcing. http://sulab.org/andrew-i-su-ph-d/

Also, it is a general rule that diversity, not quantity of methods work. If you had access to a large number of completely independent classifiers, each classifier would only need to have a classification accuracy of 0.51. But in reality what we see in ensemble learning is something like 5 classifiers with accuracies of 58, 59, 61, 62, 60 combined might only give you an accuracy of say 65. There is a good article somewhere on the internet that uses ensemble learning to predict League of Legend game outcomes that demonstrates the lack of independence in popular classifiers

ElanHR commented 9 years ago

"Also, it is a general rule that diversity, not quantity of methods work. If you had access to a large number of completely independent classifiers, each classifier would only need to have a classification accuracy of 0.51"

To expand on this, in order for boosting (the process of creating strong learners from a number of weak learners) there has to be at least some notion of independence between your weak learners.

In the case of crowd classification, I think it's likely you will only be able to extract a few clusters of independent classifiers (after which additional people will just fall into one of the previous categories). This would be especially true for tasks in which the participant must be taught how to classify since now everyone is basing their classifications on the same system. In the latter it's possible personal experience would be able to differentiate them but I still believe they'd be highly correlated (and as the previous commenter pointed out this does not imply correctness).

As a side note, I think I wrote the the League of Legends paper you mentioned (though I wouldn't be surprised if someone else used a similar approach). I ended up training a few different weak learner systems that would each estimate different games states and then used a log linear model to combine/boost the results.