openbmap / radiocells-scanner-android

WLAN and cell tower scanner for Radiocells.org
https://www.radiocells.org
Other
55 stars 26 forks source link

I would like to re-organise the application and how it stores sessions #166

Open agilob opened 8 years ago

agilob commented 8 years ago
  1. Remove idea of 'session'
  2. Remove the first screen, tab view with wifis and cell will be initial screen and scanning will start immediately after application launch.
  3. Reorganize database
  4. Use PhoneListener to listen for cell info changes instead re-scanning, should improve battery and some device-specific issues
  5. Use XML serializer to write xml instead StringBuffers, will improve performance, xml will always be valid. I have from 97 xml sessions I have 5 are invalid xmls. Current implementation abuses JVM GC on session export with StringBuilders.

Database would just store information in tables, no sessions. Single boolean flag to indicate if network was uploaded. Possibility to remove uploaded networks.

Benefits:

Drawbacks:

I have some parts of that rewrite ready in my side project and from https://github.com/CellularPrivacy/Android-IMSI-Catcher-Detector (hello @secupwn from the other side!) and should be pretty easy to move that code here.

gdt commented 8 years ago

I object to scanning starting without an explicit request to start. Scanning (really uploading, but still) has serious privacy implications since it logs location (I know you know, but being complete for others). I also want to be able to push the 'end session' button and be sure it no longer logs data after that point. But if that's "stop scanning", and there is only "data that has been scanned" vs separate sessions, that's fine.

Most of the rest of your changes sound good. I do wonder if separable changes should be landed separately, though.

agilob commented 8 years ago

That's fine. Rewrite would eliminate top bar "RadioBeacon [save]" and this could be replaced with scan/pause/stop buttons.

Most of the rest of your changes sound good. I do wonder if separable changes should be landed separately, though.

Yes. Not everything at the same time. Starting with DB rewrite -> XML serialisation -> UI.

gdt commented 8 years ago

I would also not want it to upload without an explicit request. But I think it's fine to have that upload all non-uploaded data.

Another mechanism present currently is that after finishing a session, one can choose to delete rather than upload. I tend to upload soon usually, and only delete when I've forgotten to disable scanning well before arriving. So as long as "I forgot to stop scanning, so delete non-uploaded data" is possible, I think it's ok.

I think radiobeacon is different from most of the other radio db contribution apps because it respects privacy choices of users. In contrast, keypadmapper-3 is very invasive.

In the UI, it would be nice to have a count of cells/wifis that need uploading.

agilob commented 8 years ago

I would also not want it to upload without an explicit request. But I think it's fine to have that upload all non-uploaded data.

This can be easily automated. Can do it same way as MozStumlber does, upload on WiFi connect or only manually.

Another mechanism present currently is that after finishing a session, one can choose to delete rather than upload. I tend to upload soon usually, and only delete when I've forgotten to disable scanning well before arriving. So as long as "I forgot to stop scanning, so delete non-uploaded data" is possible, I think it's ok.

All not-uploaded data would be sent or deleted. I don't want to write database manager and query for every possible thing a user can think of. I would rather make application faster and less bloated with UI things that are non-obvious.

In the UI, it would be nice to have a count of cells/wifis that need uploading.

UI inside session would not change too much, only top-bar, it currently show wifis and cells and it would stay this way.

gdt commented 8 years ago

There's bloat, and there are key security properties. What I am saying is important is that by default: 1) Scanning should only start when a user asks to scan, and the user should be easily able to stop scanning. 2) upload should only happen when the user asks to upload. 3) There should be a reasonably easy way to discard all data that hasn't been uploaded, to handle a user not stopping scanning when they should have.

I don't think any partial delete is needed. All of the above can be done easily with the current UI. This leads to a single pane (when not scanning) that has "start scanning", "upload all data", and "delete all non-uploaded data". And the scan pane has the same "stop scanning" button as now (or it looks different, whatever). What I meant by showing how much would be uploaded is that if there is a big upload button then it could say "upload all data \n (34 cells 1019 wifis)". That would tell the user how much they are uploading, for leaderboard types, and let paranoid types know there is no lurking data that they don't remember how it got there.

If you want to have checkboxes in preferences to scan on opening the app, to upload when stopping scanning, or to upload on wifi, that's fine, but I think they all need to default to off to have safety first.

agilob commented 8 years ago

That's all doable and in scope of what I was thinking. Honestly, I think that feature 'dont scan my backyard' is pretty much useless, when implemented on client-side, someone else will scan that area sooner or later. Your AP is already in wigle, mozilla stumbler and apple wifi location services.

upload all data \n (34 cells 1019 wifis)

I think this info should be in notification bar in the first place.

gdt commented 8 years ago

Having the count of scanned but not uploaded in the notification bar sounds fine. I'm really not trying to be fussy about design, just security requirements.

I agree with you that keeping other people from scanning is basically hopeless. However, that's not what I am talking about in terms of privacy. When scanning, there's a set of locations, which is basically a GPS tracklog. So scanning when driving to/from where people live or work reveals way too much data. Someone unrelated driving down the road and including the AP is not a big deal; the concern is the beginning and ending parts of the tracklogs. If you're in a city, and 1000s of people are using radiocells, and within any GPS-observed location hundreds of people live, then you might not care. But that's a complicated calculus, and history tells us that such assumptions work far less often than we think.

So I see "don't scan by backyard" as a safety net against scanning when you don't want to, not as something that really keeps your own APs out of databases. I'd like to see it stay, but have the radius configurable, so one could exclude a much larger region.

agilob commented 8 years ago

but have the radius configurable, so one could exclude a much larger region.

out of scope.

When scanning, there's a set of locations, which is basically a GPS tracklog. So scanning when driving to/from where people live or work reveals way too much data. Someone unrelated driving down the road and including the AP is not a big deal;

Ok, I see your point now. Easier fix/improvement would be to upload chunks of your track in random order, or only strongest signal points per CellID/MAC in random order. This is also out of scope.

gdt commented 8 years ago

By out of scope, do you mean you object to that existing, or that it's not related to what you are trying to do in this issue? It seems like a simple unrelated change that would fit in a different issue with it's own patch.

Random order is nowhere near good enough to hide the information that needs to be hidden. Data near locations of interest needs to be totally suppressed.

agilob commented 8 years ago

that it's not related to what you are trying to do in this issue?

It's out of scope for this issue what I'm suggesting. Would be better to open new issues for those suggestions.

agilob commented 8 years ago

I'll wait for a response from @wish7code before doing anything.

wish7code commented 8 years ago

Wow, that's great news! I've only one objection regarding the sessions, but we might find alternative concepts.

  1. Remove idea of 'session'

While I agree that sessions aren't needed for pure uploaders, it eases analysis for data researchers: e.g. I often analyse a session in depth when back home for examing cell beam structure, suspious cells, wifi coverage areas etc. Another option is generating gpx tracks for manual upload to Openstreetmap (you might have discovered the long press on map option, which generates a waypoint in the gpx files). Any ideas on that?

  1. Use PhoneListener to listen for cell info changes instead re-scanning, should improve battery and some device-specific issues

Sounds great!

  1. Use XML serializer to write xml instead StringBuffers, will improve performance, xml will always be valid. I have from 97 xml sessions I have 5 are invalid xmls. Current implementation abuses JVM GC on session export with StringBuilders.

Good hint with the XML serializers ! Guess back then when I wrote that part, I simply overlooked XML serializer and never touched that code again ;-)

But I'm still undecided whether we should stick to the xml format in the long run. The current openbmap xml format was invented in 2009 and when I started to develop Radiobeacon my goal was to keep backward compatibility, although the xml structure wasn't optimal (e.g. see the gps tagging). One option for the future might be a less talky format (e.g. json instead of xml). My only strong concern: if we change the format definitions we need server side adjustments too..

Might be just easier to drop 1/3 of current code base and write from scratch.

Might also be a viable solution, e.g. drop the UI and just keep the logic, which is encapsulated in PositioningService, WirelessLoggerService and the soapclient folder..

I'm looking forward to hearing your ideas!

Cheers Toby

agilob commented 8 years ago

While I agree that sessions aren't needed for pure uploaders, it eases analysis for data researchers: e.g. I often analyse a session in depth when back home for examing cell beam structure, suspious cells, wifi coverage areas etc

GPX should be generated from LocationListener.

Might also be a viable solution, e.g. drop the UI and just keep the logic, which is encapsulated in PositioningService, WirelessLoggerService and the soapclient folder..

I like the UI, it needs some consistency adjustment IMO.

My only strong concern: if we change the format definitions we need server side adjustments too.

I didn't want to change XML schema, so it would be backward compatible. If you have json format for upload please publish it here.

wish7code commented 8 years ago

GPX should be generated from LocationListener.

That's fine, but we still lack a UI to manage the tracks yet.

In the past, if I saw an interesting cell, back home I would that specific session and get a GPX for exactly that session which also includes waypoints for cells/wifis.

If we just have a start/stop button, that would be difficult.

Any further ideas?

agilob commented 8 years ago

Would say it's out of scope for this issue...

gdt commented 8 years ago

It's not really out of scope if something works well now, and the proposed change would break it. It seems that sessions are a useful concept, so it might be best to first do all the changes that don't involve getting rid of sessions and then rediscuss. Personally I'm not attached to them, and would prefer to analyze from the global database, and I don't actually use the GPX tracks. Basically, once you have a bunch of locations/observations, I don't find it too interesting where I was in between. (Except possibly to record locations with no cell service, but that's tricky because phones tend to go into no-service mode during some handovers, even though the location has a signal from a cell site that would work.)

wish7code commented 8 years ago

Would say it's out of scope for this issue...

I don't agree on that :-)

While improvements on the scanning service (4) and xml serialization (5) bring clear improvements, removement of sessions (1) reduces existing analysis options. I know that there're people outside, me included, who rely on this feature for offline data research and I don't want to break that feature so far.

On the other hand I don't see a significant space saving effect, if we would remove sessions. We're talking about 1 session master table with maybe 50-100 records and couple of session reference columns.

Nevertheless, I still think we might find alternative UI solutions (e.g. making the sessions less prominent, but still leave the option to export such data to anybody interested)

agilob commented 7 years ago

Do you have a spec for the JSON format?

wish7code commented 7 years ago

Let's discuss this openly: after some practical tests I feel somehow disappointed on the JSON's performance improvement potential. According to the SO discussions (theoretical) saving potential comes from JSONs smaller file sizes compared to XML. So I expected significantly smaller files while using JSON.

But when testing these theoretical improvements against the real-world openbmap file format, this doesn't apply. In my test, I took some random XML files and converted the manually (1-1 mapping) to JSON. In this case JSON files were even marginally larger than their XML counterparts.

Additionally we would have to adopt the server APIs and somehow convert the existing XML dataset. So I'm really wondering, whether it's worth the effort. Any other opinions?

Cheers Toby

gdt commented 7 years ago

I have not seen an argument that supports changing to JSON. There's certainly a change cost, so that leads to wanting not to change. I suspect there are bigger issues in terms of win/programmer-hour.