ipeirotis / Get-Another-Label

Quality control code for estimating the quality of the workers in crowdsourcing environments
70 stars 26 forks source link

Create multiple versions of NoVote_Min_Cost for data quality -- Esoteric, fix last #19

Open ipeirotis opened 12 years ago

ipeirotis commented 12 years ago

The NoVote_Min_Cost uses the value of the prior probabilities to define what is the baseline cost of a "strategic spammer"

One key thing is that the prior probabilities, which can be estimated in different ways:

  1. Use fixed priors, passed by the user in the categories.txt file (preferred)
  2. Estimate the priors from the evaluation data (measure percentage of objects in different categories in the evaluation data)
  3. Estimate the priors from the training data, (DS.categories.getPrior when running without fixed priors, if running with fixed priors we need to measure percentage of objects in different categories). This generates the problem that DS reports different priors than MV.

I would put an advanced switch in the command line to determine what type of prior to use for the normalization. By default it should be (1), with a secondary preference for (2). The option (3) [which is the current implementation, when we do not have fixed priors, and uses the DS priors] should come with a warning.