mozilla / ichnaea

Mozilla Ichnaea
http://location.services.mozilla.com
Apache License 2.0
573 stars 139 forks source link

Determine metrics and experiments for data correctness #1381

Closed jwhitlock closed 2 months ago

jwhitlock commented 3 years ago

A testable definition of data correctness would allow us to evaluate MLS as a location provider and guide product development. From the original bug 1613503:

In one context, "correctness" means "a geolocation query gets the same result between MLS and GLS". Determining correctness there probably involves running an experiment on a user population that queries MLS and GLS and reports back either a "they're the same/different" or a "they're off by x amount" or something like that.

Another way of defining "correctness" is "how far off from reality is the geolocation query result?" We could run a different kind of experiment where Firefox determines the location from a geolocation query to MLS, shows a map (open street maps?), and asks the user where the user actually is. Then it reports back some kind of "they're the same/different" or "off by x amount" or something like that.

Are there other things we can do to determine data correctness or something that can act as a proxy for correctness?

jwhitlock commented 3 years ago

We're in round 2 of an experiment to compare MLS to GLS (bug 1669364), in the case when Firefox 83 would have used GLS anyway. The metric is a bucket histogram of the difference between the two measurements. The benefit to doing this in Firefox is that the client IP address can be used by MLS and GLS.

A similar experiment could be done offline by the MLS team, omitting the IP address from geolocation.

A "wrong location" feedback mechanism (https://bugzilla.mozilla.org/show_bug.cgi?id=1650371) could be used as a data point, and to improve the database. A similar feature could be used as a survey tool to measure location correctness.

The data submitted via the geosubmit API could be first fed to geolocate, to see how far the submission is from what MLS would return for the same data.