ubriela / geocrowd-pricing

Geocrowd Pricing Strategies
1 stars 0 forks source link

Develop metrics that measures the coverage of pics across locations #2

Open ubriela opened 9 years ago

ubriela commented 9 years ago

This metric should take an array as input and return a double value. In the array, each element represents the number of pics in each location, and the returned value should characterize the diversity of the pics in terms of locations. For example, the following arrays are sorted in increasing order of diversity: 10,0,0,0,0 5,5,0,0,0 5,2,1,1,1 4,2,2,1,1 2,2,2,2,2

Ideally, we want more total pics, and a more uniform array because as so, we have approximately the same number of pics in each location and therefore get a better, non-biased understanding about the locations, such as in the last array; otherwise, in the first array, we have no pics for the last four locations, and therefore no information about the areas corresponding to the last four locations (assuming we use pics to exam how clean or how good the pavement of the streets in the area are.

One suggestion is to use Shannon Entropy https://en.wikipedia.org/wiki/Entropy_(information_theory)

ubriela commented 9 years ago

A simpler metric is variance. https://en.wikipedia.org/wiki/Variance