saintbyte / openbmap

Automatically exported from code.google.com/p/openbmap
Other
1 stars 1 forks source link

Blacklisting and similar filtering should be done on server rather than on client #47

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
The latest versions of Radiobeacon have introduced increasingly elaborate 
filtering of WiFi networks whose geographic location is expected to change 
frequently. I am quite skeptical of doing this kind of filtering locally, 
before uploading data, as there is quite a risk of false positives:

- Driving along a motorway one might still pick up a signal from a building in 
the vicinity. Service stations come to mind... nowadays they are pretty sure to 
have at least one access point, and these are perfectly stationary. Filtering 
them out on the basis that they are located close to a motorway will discard 
some perfectly valid data.
- MAC ranges associated with mobile devices: do we know for sure that chips 
with this MAC range were never built into in stationary equipment?
- SSIDs: users can choose any SSID they want, thus finding strings such as 
"iPhone", "ASUS" and the like in the SSID are indicators that this access point 
MIGHT change its position, but not a sure indicator.
- Finally, there may always be the odd owner of a cabin in the woods who has 
taped his old smartphone to the wall and uses it as a WiFi access point. 
Despite being a mobile device by all means, its position won't change, and it 
may even be the only WiFI around, making it even more valuable. Or a homebrew 
WiFi router which uses a USB or PC card WiFI adapter which would be classified 
as a mobile device due to its MAC.

On the other hand, there are cases which will never be caught by this approach:
- People or offices moving: they frequently take their equipment with them. 
That equipment is fixed in nature, and once it has moved to a new location, it 
will stay there for some time. However, after the move the database will still 
report them to be at their old location, until someone scans the new location, 
after which the database will have both locations.
- Fixed equipment used in temporary installations. I am wondering how many of 
these I have picked up as I cycled around the Oktoberfest. They're there for 
only two weeks, and who knows where these BSSIDs are going to surface next.

Conclusion: dealing with moving WiFis is a lot more complex than just comparing 
against a blacklist of SSIDs and BSSIDs. It will always be a guessing game, and 
the more data we base our guess on, the better it is. To get a maximum of data, 
we would need to run such heuristics on the server.

As a primary input I would use the actual movements of the WiFi. A conventional 
WiFi covers a range of some 100 meters around the base station, so I would 
consider a WiFi to have moved whenever two subsequent positions for that WiFi 
are significantly more than 200 meters apart. To establish the position of a 
WiFi we should then only consider the data collected after the last move.

Additionally, we could introduce a score for each WiFi, which indicates how 
reliable position estimates for this WiFi are. That score could then consider 
the blacklist criteria, as well as some more:
- Partial SSID match = bad, full SSID match = very bad.
- MAC range match = bad, full MAC match = very bad.
- Mean time between moves: the longer, the better.
- Number of moves recorded: the fewer, the better.
- Time since last move: good if considerably lower than mean time between moves

Clients trying to determine a position based on nearby WiFis could then take 
that score into consideration and give preference to WiFis based on two 
criteria:
- Good positional stability based on the above score
- Proximity to other WiFis received at the same time: if the positions of two 
WiFis in the DB are significantly more than 200 meters apart, it is a sign that 
one of the two may have moved.

Original issue reported on code.google.com by mich...@vonglasow.com on 27 Oct 2013 at 8:25

GoogleCodeExporter commented 9 years ago
Michael, first of all thanks for that really detailed suggestions!!

The suggestions are very good and I would agree on nearly all points. So to 
state it very clearly, server-side filtering is the preferred way with only few 
exceptions (e.g. _nomap wifis which can be safely discarded on client side).

The current approach is a work-around as we lacking manpower to relaunch the 
server-side processes. I guess we're in desperate need for some helping hands 
to improve the website and the server side processing as Mick is very, very 
busy at the moment. Anybody experienced in server-side frameworks (or even the 
typical website technology) is very welcome! 

So to provide a short-term solution I implemented some basic client-side 
filtering, accepting we'll get false positives by exactly the arguments you 
mentioned. I tried to judge carefully, before adding wifis to the list: e.g. 
after my highway rides I analysed tracked wifis manually to identify moving 
wifis, e.g. the long haul buses. 

At the moment this is pure manual work, so no automatic in place. Your 
eagle-eyes spotted the short-comings nevertheless :-) (e.g. Asus and some 
Netgear devices made it to the list, although being very ambiguous in fact...)

Given all your good suggestions, let's keep this issue sticky until we 
succeeded in implementing server-side filtering..

Original comment by wish7code on 31 Oct 2013 at 8:28