openbmap / radiocells-scanner-android

WLAN and cell tower scanner for Radiocells.org
https://www.radiocells.org
Other
57 stars 25 forks source link

Brainstorm: Boost radiocell data #127

Open amilopowers opened 8 years ago

amilopowers commented 8 years ago

I am wondering how to boost the amount of data gathered. Other projects such as Mozilla's Location Service have a lot more scanned radiocells than we have. Mozilla boosted their data when they introduced scanning in mobile Firefox. So I am thinking about a way to do something similar for radiocells.org.

Do you have other suggestions to boost data in our project? Unfortunately I am light years away of coding my own app so I just can assist with ideas.

gdt commented 8 years ago

I would like to see a way to have more scanned data while avoiding privacy issues. The uploaded scan data is basically a log of where you've been and when, and that's not ok in general. I wonder about having local scanning that runs more often and builds up a database of where the strongest signals were for each cellid, and uploads those only, without accurate timestamps.

amilopowers commented 8 years ago

I do absolutely agree with you. Are they nowadays in plain text? So can the server see where we have been? It says that it never uploads GPS tracks so I thought it would be kind of privacy aware.

gdt commented 8 years ago

The upload seems to be (location, time, cellid, strength) tuples. That's different from a GPX track in that it only includes locations with cell reception, which is not that much different usually. This is what I expected to be uploaded; this isn't a failure of transparency but just more or less inherent in the mission.

wish7code commented 8 years ago

We could mitigate the issue for cells and I sympathize for your suggestion for client side aggregation as a privacy option, but for wifis it's really, really hard. I've no clue, how we could manage that in a privacy-friendly way. Aggregation would work very well for cells especially in rural areas, where we have large coverage areas of several kilometers. For wifis that's unfortunately impossible due to the tiny wifi coverage areas in the range of meters.

It's basically 'tell me which wifis you've seen, and I tell you exactly where you walked'.. So I guess for the privacy mode, we would have to disable wifi scanning at all?

Other privacy options include randomizing timestamps for each measurement to obfuscate direction of travel and of course anonymous upload to prevent user tracking (latter one already implemented server-side...)

amilopowers commented 8 years ago

What about promoting the project or different ideas to boost data? Do you know any possible ways?

wish7code commented 8 years ago

@amilopowers You're right, we got off-topic here ;-) Nevertheless we might open a separate issue on the privacy topic..

Do you know any possible ways?

Technically it's quite easy, it's just a scanning wifis & cells (code can be taken copy&paste from the Radiobeacon client), XML file generation and finally a http post request for server upload.

Proof of concept has already made by one of our friends, but his (open source) app is still in close beta..

amilopowers commented 8 years ago

Proof of concept has already made by one of our friends, but his (open source) app is still in close beta..

Do you have any hint where and how this will be implemented? Or is it a standalone app?

gdt commented 8 years ago

(I will downplay privacy issues here in favor of the new privacy issue.) I see three separate issues to increase contributions:

Finally, I wonder how this db and other databases can be combined for use, subject to the various licenses. Can we get a fused db of collaborating projects vs competing projects?

amilopowers commented 8 years ago

The app "Tower Collector" from @zamojski has an option to capture "neighboring cells". It seems like cells tell the phone the next cell ID to hop over. Can our app do that as well?

wish7code commented 8 years ago

@amilopowers In theory yes, TowerCollector seems to use default Android API getNeighboringCellInfo.

In reality getNeigboringCellInfo is a very cumbersome API call, as most hardware manufactors/ROMs don't implement it properly (e.g. Samsung returns cell id -1 all the time instead of the actual cell id)

wish7code commented 8 years ago

@gdt

increased reward/comfort for contributors. Here, the most important thing is to see data on the map in a timely manner after uploading it. Within an hour would be a huge boost. I realize there is some bug that makes this hard, but it would make a big difference.

Server still has some issues in processing cell updates in realtime: spatial queries to calculate cell coverage area are very time consuming and may cause database deadlocks. Thus we disabled realtime processing for cells. Wifis are already parsed within ~15min. So next steps are 1) fixing real time processing for cells and 2) faster map update cycles, so processed cells & wifis also appear within 15min

The second is to be clearer about the leaderboard. I would sort by "wifis/30000 + cells", because now people that contribute cells but not wifi (perhaps due to privacy concerns) fall off the list.

Sounds good, we might have to experiment with the weighting factor..

Finally, a logged-in user should always see their own stats in the leaderboard, even if they don't rank.

Accepted

increased ease of using the results. I haven't set up the other providers, but it would be good to document how to do this, on rooted and non-rooted devices, and to try to help other projects (e.g. osmand) integrate support even on non-rooted devices.

Accepted

increased ease of contribution while respecting privacy. Perhaps this is a version of radiobeacon that will gather in the background, only running GPS when seeing a new cell, or a better strength than before, so that is is lower battery use. And also a version that is more privacy aware, only reporting cells (and especially wifis) when the unit is moving, or is at least several minutes away from stopping, and perhaps has multiple prohibited locations. Perhaps even showing all locations to be reported, and allowing filtering before uploading.

Accepted

Finally, I wonder how this db and other databases can be combined for use, subject to the various licenses. Can we get a fused db of collaborating projects vs competing projects?

Especially for the geolocation use case this would be very useful. I analysed openwlanmap/openwifi.su dataset some time ago and found it pretty complementary to our dataset

amilopowers commented 8 years ago

I think a numbering in the ranking would be nice. So we don't have to count ;-)

Also I think of a separate "unique cells" ranking. What brings us forward is to scan new areas instead of those which are scanned over and over again.

mvglasow commented 8 years ago

My two cents here:

A background logger (I think I've suggested that one before), which attaches to the passive location provider and starts logging as soon as it gets a location. That is, as soon as you fire up your navigation app, OSM tracker or any other GPS app, you're automatically collecting data for openBmap as well. Can be part of Radiobeacon or a separate app.

Can we get a fused db of collaborating projects vs competing projects?

OpenCellID and Mozilla have compatible licenses (IIRC), so we could easily import their cell datasets periodically. (Unfortunately that's for cells only – OpenCellID doesn't do wifis, and Mozilla doesn't release raw wifi data for privacy reasons – but even with just cell data it's definitely going to be an improvement.)

Improve map rendering on the website: Rendering an area with a lot of wifis (Munich, as well as certain areas of Milan) takes a long time, and after importing new wifis, they still take days or even weeks to show on the map. Because of this, I still rely on my own rendering to check whether I've already been to a particular area. Having that information online helps people plan their trips according to what needs surveying – but that requires a map which is accurate and loads quickly.

The second is to be clearer about the leaderboard. I would sort by "wifis/30000 + cells", because now people that contribute cells but not wifi (perhaps due to privacy concerns) fall off the list.

Sounds good, we might have to experiment with the weighting factor..

How about allowing users to sort by number of wifis (as is the case currently) or by number of cells? After all, they are two different disciplines.

Also I think of a separate "unique cells" ranking. What brings us forward is to scan new areas instead of those which are scanned over and over again.

+1 for "unique cells" (and wifis). I currently notice that after uploading a session of ~1000 wifis, my stats increase by much more (I suspect a wifi that's sampled five times will be counted five times). Having wifis sampled multiple times is good for accuracy, but we still have a lot of uncharted territory – hence I'd favor stats that answer the question "how many new cells/wifis has this user added to the database?".

try to help other projects (e.g. osmand) integrate support even on non-rooted devices

How about a location library which app developers can integrate into their apps? It would work much like a location provider, but it wouldn't require messing with the location provider framework on Android. Pro: would work on any device, no rooting required. Contra: devs need to build it into their app, and only apps that have the library compiled in would be able to use it.

agilob commented 8 years ago

I think it would be a place for a competition, instead importing their dataset, create band new one, without cells that don't exists since 2010.

OpenCellID and Mozilla have compatible licenses (IIRC), so we could easily import their cell datasets periodically.

MLS imports OCID data, and OCID imports MLS data, they got an agreement and copy each other periodically.

If anyone is interested in working on AIMSICD competitor that would use Openbmap, contact me.

mvglasow commented 8 years ago

MLS imports OCID data, and OCID imports MLS data, they got an agreement and copy each other periodically

I know they've agreed on a common data format some time ago but I wasn't aware of them actually exchanging data. Then, OCID went from 8 to 24 million cells last month (MLS has about 22) – does that mean they actually imported MLS data?

I think it would be a place for a competition, instead importing their dataset, create band new one, without cells that don't exists since 2010.

Importing their data would give us some 24 million cells (as opposed to 0.5 million currently in openBmap), which would help adoption of location services.

There'd still be some competition: we'd get a lot of cell coverage but no extra wifi coverage. In areas with cell but no wifi coverage, location estimates would be quite crude (3000 m accuracy as opposed to 30 m) – enough to get a location for a weather report or events in town, but not enough for navigation. For that we'd still need wifi data – having improved cell coverage might help us get more people interested in the project and turn them into contributors. And as new contributors collect wifi data, they will pick up cells along the way, which will help us to gradually replace imported data with our own.

amilopowers commented 8 years ago

does that mean they actually imported MLS data?

@mvglasow Yes, they did. That's a confirming tweet from March 30: https://twitter.com/opencellid/status/715094547630538752

amilopowers commented 8 years ago

I do agree that importing their data (MLS and OpenWLANmap) would help us in actually have devs. and services use our service.

But: before we start importing/merging data we should clean up the project and improve server power.

  1. Agree on one persistent name, I would suggest "radiocells" over "openbmap".
  2. Permanently forward (301) www.openbmap.org to www.radiocells.org
  3. Rename the Radiobeacon app to "Radiocells ...".
  4. Improve server speed like suggested in #114. Currently it takes very long to render data on the map or forever to download maps/cell data to the Android app.
  5. Update the Android app. I think an easier view would help new users (with an option to go in "expert" view). Most new scanners do not care if the cell is GSM/LTE or whichever.
  6. We could develop a plugin for popular projects like OsmAnd (I suggested that yet) so people could scan for us without much knowledge. Since we have the ability to upload anonymous they wouldn't even have to setup much (like accounts).
  7. Then: Import data from others.

I do map often for openstreetmap.org and could suggest mappers to scan for us over the mailing list. But i won't do that until our project is more powerful as it is right now.

To make all that happen we might should team up and open a forum or something like that. We then could make different sub projects (or we just do it on github). For example a server group, website group, advertising (writing blog posts et cetera)...

mvglasow commented 8 years ago

Just found something else: there are enthusiast groups out there mapping out cell towers, verifying their location on the ground and collecting all the information they can get on it – including the cell ID. A website I came across is http://www.senderliste.de/ – it's for Germany only but has links to similar sites in other countries (some of the others being databases maintained by government agencies).

That data would be superb for geolocation – rather than just making a few random measurements, they verify each cell tower down to a street address. If we could collaborate with these groups, it would be a win-win situation:

Granted, cell location is somewhat crude, but in less densely populated areas they're all we can get. And I'm looking ahead: in the future we might be able to use multiple cells for geolocation, which would increase accuracy.

Licensing would have to work out: since our cell data in under CC-BY-SA, we can only collaborate with cell spotters who are willing and unable to

That being said, we could also advertise openBmap in other communities that are already going out and surveying other things, or get around a lot for other reasons:

Though I agree with @amilopowers that we should do our housekeeping first. If we attract more people than our server infrastructure can support, we risk losing them for good.

amilopowers commented 8 years ago

I like your ideas @mvglasow but how do those people on the ground know which ID a cell tower has? I mean they probably can't read it in most cases.

Switzerland has pbulished them publicly by law but I don't know how the data can be used. I do map for openstreetmap quite often. That's why my area got covered in very short time. When I look at MLS I have to admit that they're lightyears ahead of us. They have all those Android devices running Firefox. So it might be a good idea to specialise and link measurements to Cell tower locations.

mvglasow commented 8 years ago

@amilopowers they use surprisingly simple hardware: an old school cell phone (only condition being that monitor mode must be enabled, which reveals cell IDs and other information) to determine the serving cell and distance (based on timing advance, RSSI and signal quality) and optionally a laptop and GPS. A closer look at the antenna hardware provides hints at the operator and the frequency spectrum used. If the base station has segmented antennas, slowly walk around it once to pick up each cell once. Instructions in detail: http://www.nobbi.com/suchmich.html

About OSM – I guess a few of us are active there as well (including myself), but since OSM is a much larger community, there are still plenty of OSM mappers who haven't heard of us yet. And unlike Mozilla, we share all of our data, which might enable other applications besides geolocation.

amilopowers commented 8 years ago

Wou you are right. I didn't know that that would work.

I kind of fear asking the OSM community. Some of them can sometimes be quite weird. Advertising in a talk (e-mail) channel... I'll do it but who knows what happens.

Next week I should get the old S2 of my mum. So I can scan 4G/LTE and <4G at the same time.

amilopowers commented 8 years ago

@mvglasow Shouldn't we even be able to calculate cell tower location on the server side?

I mean once we have lets say 3 or 5 measurements of a particular cell we should be able to estimate the GPS location of that cell.

mvglasow commented 8 years ago

It will always just be an approximation. We have a bunch of measurements with signal strengths, from which we can somehow infer the location of the mast. In theory, the signal strength decreases with distance, but in practice there are a lot of variations such as obstacles, different devices with different antenna gains and measurement inaccuracies (GPS accuracy as well as timing if the measurements were taken while moving).

Cell signals are particularly tricky because, unlike wifi signals, cell signals are often directional so all measurements will be on the same side of the mast. To my knowledge, their location is currently approximated server-side by taking the strongest measurement and discarding all others. Therefore, accuracy will never get better than the minimum distance between the actual mast position and one of the closest measurements.

wish7code commented 8 years ago

Just found something else: there are enthusiast groups out there mapping out cell towers, verifying their location on the ground and collecting all the information they can get on it – including the cell ID.

@mvglasow That's indeed an interesting idea! Besides Germany I think France has a very strong community (e.g. http://rncmobile.free.fr/map/).

Cell signals are particularly tricky because, unlike wifi signals, cell signals are often directional so all measurements will be on the same side of the mast.

That's my general observation too. I've seen very little omnidirectional towers in my region. Guess in the near future cell strength based location will become even more difficult due to dynamic multibeam forming (e.g. https://www.telekom.com/media/company/301824 , section How it works). Nevertheless currently if you see a measurements of >-60dbm, you're probably standing just in front of the mast...

mvglasow commented 8 years ago

Guess in the near future cell strength based location will become even more difficult due to dynamic multibeam forming

If you go through http://www.senderlistemuc.de/, you will find that there are a few repeaters (mostly installed in tunnels) which announce themselves with the cell ID of a nearby base station. (Unfortunately the list does not have any PSCs – would be interesting to know if the repeater and its base also share a PSC – if not, the PSC would help us distinguish the two.)

Apart from that I had a more generic idea about promoting openBmap: For geolocation we might not be the number one choice, as there are other providers with more data than we have. But we're the only ones who share both their cell and wifi data, including raw measurements. Therefore, we might want to focus on other ways in which our data could be used and which is not possible with any competing solutions.

amilopowers commented 8 years ago

I think the fact that we share raw data is rather an issue than a feature. We should that address in #129 and stop to share it. @wish7code can you please find a way to stop sharing raw data? No one should be able to determine where I went or go.

gdt commented 8 years ago

I think the raw data issue is worthy of some discussion. Overall, I tend to think that sharing it means that the privacy issues one perceives are the actual correct ones. If it isn't available to all, then the question becomes who has access, either site admins or those that have compromised the site.

For data privacy, there are two issues. One is decoupling the labeling of points with a userid, and anonymous upload coupled with the server not keeping IP address logs more or less solves that. The second issue is far harder, which is that if the number of contributors in an area is not high enough, the reported data can be correlated.

If raw data isn't published, then what distinguishes this project with opencellid and MLS? One remaining difference is that radiobeacon is straightforward and transparent about what it uploads, and opencellid seems to endorse keypadmapper-3.