microg / AppleWifiNlpBackend

UnifiedNlp Backend that uses Apple's service to resolve wifi locations. Location calculation is done onboard and wifi locations are cached to minimize data usage.
Apache License 2.0
78 stars 15 forks source link

Add caching configuration #7

Open haarp opened 9 years ago

haarp commented 9 years ago

It's not entirely clear for me as an user how the cache works. I keep seeing data usage even for places that should be completely cached by now. How long is location data cached for?

I propose that the caching duration be configurable. This way someone can a) Figure out how long the cache duration is b) Modify it to balance privacy and up-to-date-ness

:)

(btw, it is impressive how much dedication and polish is flowing into microg. Thank you for making this!)

mar-v-in commented 9 years ago

Currently the cache duration is hard coded to 30 days. I'm sure I'll add an option to change this some day :smile:

The behavior you described most likely is due to how the cache works: If a visible AP is unknown, the client asks for its location even if it would already be cached if an location is known. Only once an AP is known to be unknown by being named in a request, the cache will store the negative response. I'm not sure if there is any better way to do this, but I guess it should be possible to save some "area" of already downloaded cache and do not request for unknown APs inside this area - but this does only work if the current location is already known (maybe due to other visible APs?). If you have any good ideas on how caching can be improved I'd like to hear them.

haarp commented 9 years ago

Sorry, I do not understand your description. From what I understand, Apple does not provide the location of each AP, but instead locations for "sets" of APs (correct me if I'm wrong)

So what if for each AP we can see we pretend that only this AP is visible, send that information upstream, get location data for each AP and then determine the device location (average based on all visible AP). This would allow very easy caching.

n76 commented 9 years ago

@haarp How is requesting one WiFi AP at a time more efficient that requesting information for a number of APs in one request?

haarp commented 9 years ago

It isn't. But it would allow efficient caching of the location of each AP.

mar-v-in commented 9 years ago

I guess the misconception here is how the Apple API does work (and I did not document it a lot yet).

Apple's server provides location data per AP, no location calculation is done on their servers. This is different how most other location APIs (Mozilla, Google, ...) work, which do not share the real AP location, but only the location averaged from a number of APs.

With Apple's server, a request can contain any number of AP mac addresses (no additional information like rssi is send). It is not necessary that these APs are next to the others in the same request, they can be at any place in the world. The response consists of the location, an accuracy value and wifi channel number for each of the requested APs. Additionally, the 200 APs geographically next to the first AP in the request list will also be returned to fill up the cache (cause Apple does not want us to do thousands of requests). Thus, compared to Mozilla and Google APIs, Apple's API does not require a lot of requests and caching can be done efficiently.

The current cache implementation does not take into consideration that the answer will contain the nearest 200 APs. This would allow it to not do new requests if an unknown AP is inside the circle of APs returned in a previous request.

However if you stay at a single location no new requests should be required. If this happens to you, can you check if logcat contains log entries "Requesting Apple for xxx locations"?

haarp commented 9 years ago

Ah, understood. But then I wonder why the nearest 200 aren't cached. Is there a reason for it?

I'll check for requests next time I'm on the go.

alienman1 commented 7 years ago

Is still missing, i should write it, shouldn't I?

mar-v-in commented 7 years ago

Sorry for not answering here to @haarp s last message: everything that is sent from Apple to the phone is stored for at least 30 days, which imho is a good value for most. The most important thing is not to make the cache duration an option (which of course won't hurt, but it also does not solve problems), but to find a solution to be aware what area is cached (which is not that hard) and to know if an unknown AP resides in this area (which is a lot harder). Only if we know that a certain AP would already be in the cache if it was known to the backend servers, we can reduce the number of requests done to the servers drastically.

My current idea to solve this is to store the circle of cached data per request in the cache database. This way we know what area we have in the cache db. If we encounter an unknown AP we try to locate based on the surrounding APs which we have cached data for. If this does not result in any location (full cache miss) we request new data from Apple. If some of the APs are not present and the others locate us near the border of cached area (half cache miss), we try to request the missing APs, adding the known AP nearest to the border to the request (this ensures we receive a response from Apple, effectively moving the border away from the current location and thus reducing the need to do a very similar request again). And finally if the current location as guessed from the subset of known APs is not near a border, we ignore the other APs that we don't know anything about (missing negative cache entry).

The problematic variable here is: what means "near a border". Wi-Fi signal can easily spread over 100 meters and more and in ubran areas, the 200 nearest APs to a loation can spread far less than 100 meters radius. Of course we can make this a variable in the settings as well, using some not too bad default (maybe 50 meters?), but that setting would require a lot of explanation to the user. Another approach would be to go relative: "near a border" could mean that the minimum distance to the center of a circle is higher than xx% (maybe ~90%) of its radius. This presumes a bit that wifis have less range in urban areas, which is not necessarily true, but is probably far better than absolute values.

I am open for input and suggestions here.