n76 / DejaVu

Yet another network location backend for the UnifiedNLP/microG project
GNU General Public License v3.0
100 stars 18 forks source link

Future development #42

Open Helium314 opened 2 years ago

Helium314 commented 2 years ago

@Lee-Carre @gdt @n76 In #41 was the question about someone willing to take over the project... I'm definitely willing to help, but I don't have any experience with actually maintaining an app, or making it available in F-Droid. Maybe @gdt would be would be more suitable?

As mentioned, I've been working on some improvements:

What is changed

What is missing, but should be done

gdt commented 2 years ago

If I start dealing with this, I will want to be measured and slow about changes, and at least initially limit to changes that really seem necessary to bring this back to maintained status. So that means targetting new API, fixing bugs that cause actual user problems. I expect a lot of people may find this frustrating, but I lean very hard to software quality rather than speed, and from long experience expect trouble with everything, even if that's only sometimes true.

I do think a simple UI for showing database counts and export/import would be good; that would I think be sufficient to say there is no reason for anyone to run the old wifi location app.

I am strongly disinclined to go down the conversion path to kotlin. I see it as churn that is not solving a problem, and large changes in projects as a preference doesn't seem like a path I want to go down. I don't mean to say never, but so far I have not seen any rationale for "if we don't convert to kotlin, bad thing X will happen". Also, language conversions and other changes interact particularly badly in terms of merges, so conversions really need to be done at a time there isn't much else going on. I view significant "refactoring" the same way, unless it's limited in which files and fixes something that really needs fixing, and the merge pain is outweighed by the gain. (If you read that as I'm skeptical, you read right!)

Tests sound good. Small changes that have some combination of test coverage and other people testing, and that are within the local-only, low energy use model, seem like reasonable things to merge.

However, I would really be inclined to just have a new release with outright fixes, no API retargetting, very minimal first, to bring it back to 'has a release in the last 6 months so it counts as maintained'. Then add new wifi/LTE and target a modern API and release again. Then some of the more complicated fixes.

As for database, I would like us to consider what it takes to use spatialite. This is really a spatial db and on a regular computer would probably be in postgis, and then with indexes spatial queries are fast. I don't want to have backdoor manual implementation of spatial indexes, which is a partial solution and probably more work.

So, assuming some path forward, I would recommend that people with forks have separate feature branches for separate things, and locally merge them into an integration branch to run for their own purposes. Turning on rerere is very helpful for this technique, which I used successfully on a work project with 10 mostly-full-time people and about 6 active feature branches at any one time. This is basically the difference in approach from having a repo that intends to contribute vs one that is heading off in a different direction. I realize you may have started down the path not expecting to be able to merge back, but I wanted to describe what I see as the path to "unforking".

Lee-Carre commented 2 years ago

Thanks for the mention.

I don't have any experience with […] making [an app] available in F-Droid.

I gather there's not a vast amount to it (he says with no first-hand experience of actually doing it).

The F-Droid folks tend to be rather helpful. Especially, I imagine, since Déjà Vu is already listed, only a matter of updating it with new releases.

Some parts could be automated with the new Github Actions, which seems popular with quite a few other repos.

blacklist improvements

A though I had, re blacklist processing (which probably warrants its own ticket). Since applying the blacklist seems to happen at capture-time, what happens when the blacklist is updated? Notably to existing (that is, already-captured) entries, which now match the new blacklist.

Unless already handled somehow, I suggest checking the DB occasionally (at launch, since that'll trigger post-reboot and post-installation (of new versions, which may have a new blacklist).

Detection / use of Bluetooth emitters

Assuming that you don't mean all 802.14; what types are useful other than beacons? To my awareness, most everything else is liable to move (and thus be useless).

Unless usefulness is determined by the location remaining static after many (500?) samples, and treated as reliable for positioning.

Some stationary devices would be.

But, only with a reliable way to filter-out roaming (or, rather, filter-in stationary) emitters.

If implemented, should also apply to beacons, as a sanity-check that they really are stationary. Nothing about beacons themselves (spec or intended applications) says that they should be stationary. Many are battery-powered.

deal with WiFi scan throttling

Given the default config of Android (v9 & v10, at least), yes.

However, handling of cases where users have disabled the misfeature in Developer Options, Déjà Vu shouldn't assume that it may only scan every NN seconds. Though, I'm not sure how it should detect that it may scan more often.

Note that I'm not saying that it should or must scan multiple times per second always. Having this be variable seems wise for many reasons.

I think it might be best to have it vary based on movement speed. Faster movement meaning more frequent scanning. When stationary, once a minute (if not less) should be plenty.

An obvious exception being when it starts receiving detection reports of several emitters it's never seen before; then perhaps increase the rate. My thinking being; entering a previously-unscanned area, so be sure to capture the data.

newly introduced/split emitter types (6 GHz WiFi, 5G, LTE, 3G variants, Bluetooth)

While not disagreeing, I've heard that 6G is already in the works, to be the new new hotness in future. Though, 5G is still plenty new enough.

maybe a simple UI for [information]

That, specifically; sure. I'd suggest keeping the feel / impression of it as being for debug / advanced usage.

I'm strongly inclined to agree with n76's choice of ‘no knobs’, for a whole bunch of reasons. Automation for the win.

allow overriding some detections (e.g. wifiManager erroneously reports my phone isn't capable of 5 GHz WiFi)

I think it would be better to encourage such users to report the problem so that the detection may be fixed in code, for the benefit of all users, which will keep Déjà Vu configurationless.

unit tests

Definitely. Lots of reasons. Sensible practise, generally. Catching & fixing bugs early, help keep things organised (& code cleaner).

kotlin

Tricky one.

gdt makes excellent points, which I'd summarise / characterise as ‘change for the sake of it (or for fashion trends) is risky’. With that, I'm inclined to agree.

However, familiarity for developers is a major point. I'm thinking of bug-count.

@Helium314 how feasible would it be to do as gdt suggests of bug-fixes first, and a fresh release, before making deeper changes? Is Kotlin a preference, or are you uncomfortable (not confident) with Java?

I'm not sure I (as a non-programmer) can contribute much else to the choice-of-language decision. Probably best settled between Helium314 & gdt (& n76), until I've learned enough to have an informed opinion.

there is no reason for anyone to run the old wifi location app

I very likely still will for similar reasons to why I still use the GSM backend

I would really be inclined to just have a new release with outright fixes

I'm strongly inclined to agree with the one-step-at-a-time philosophy of gdt. Lots of reasons, most from experience.

database

My only caveat is that any changes to schema, or DB type, should absolutely not lose user-data (by which I mean, here, already-collected emitters).

Helium314 commented 2 years ago

If I start dealing with this, I will want to be measured and slow about changes, and at least initially limit to changes that really seem necessary to bring this back to maintained status. So that means targetting new API, fixing bugs that cause actual user problems. I expect a lot of people may find this frustrating, but I lean very hard to software quality rather than speed, and from long experience expect trouble with everything, even if that's only sometimes true.

In my opinion the blacklist fix and the fixes for bad location reports are quite important fixes. The bad location reports are the reason why I did the related changes. (Actually I've been using them for the last ~2 years)

I am strongly disinclined to go down the conversion path to kotlin. I see it as churn that is not solving a problem, and large changes in projects as a preference doesn't seem like a path I want to go down. I don't mean to say never, but so far I have not seen any rationale for "if we don't convert to kotlin, bad thing X will happen".

Using kotlin was, as written above, because I feel much more familiar with it. I do not want to look up how to do something in java when I know how to do it in kotlin. If it's not just for me personally, I'm ok with java as well.

However, I would really be inclined to just have a new release with outright fixes, no API retargetting, very minimal first, to bring it back to 'has a release in the last 6 months so it counts as maintained'. Then add new wifi/LTE and target a modern API and release again. Then some of the more complicated fixes.

I can create a bunch of issues, one for each problem I found, and mention what I used for solution / workaround.

As for database, I would like us to consider what it takes to use spatialite. This is really a spatial db and on a regular computer would probably be in postgis, and then with indexes spatial queries are fast.

Is it really worth using a different library? I had a look at spatialite a few months ago, and had the impression the Android version doesn't get much maintainance. Would there be any uses of spatialite other than the spatial indexing using R-tree? I recently compared SQLite with R-tree vs 'normal' indices for a different project, and found that for points, and there was almost no difference between latitude/longitude index and R-tree. If we want to use bounding boxes instead of center and radii, R-tree / spatialite will be more useful.

I don't want to have backdoor manual implementation of spatial indexes, which is a partial solution and probably more work.

Actually it's one line for creating index on latitude/longitude, no further changes needed to make use of it. And the performance advantage is significant.

I realize you may have started down the path not expecting to be able to merge back, but I wanted to describe what I see as the path to "unforking".

Right, I had started to put together changes I thought necessary, plus some performance improvements. The plan was simply releasing an apk on github and rarely / never touching the code again.

Helium314 commented 2 years ago

A though I had, re blacklist processing (which probably warrants its own ticket). Since applying the blacklist seems to happen at capture-time, what happens when the blacklist is updated? Notably to existing (that is, already-captured) entries, which now match the new blacklist.

I already mentioned this in #41: in current DejaVu, blacklist is never checked. But it very much looks like it's supposed to be checked when the RfEmitter is created (and that's how it happens in my version). If an emitter is blacklisted and has a coverage (which is the bounding box), it is removed from the database. See https://github.com/n76/DejaVu/blob/e54aae2d9f3d4619065a258b431ace1911d1e931/app/src/main/java/org/fitchfamily/android/dejavu/RfEmitter.java#L319-L321

Assuming that you don't mean all 802.14; what types are useful other than beacons? To my awareness, most everything else is liable to move (and thus be useless).

Bluetooth emitters were mostly an idea requiring more testing and investigation. Currently I don't know how to find a device's Bluetooth class (for range estimation), and how to separate Bluetooth beacons from other devices. The latter isn't going to work like it's done for WiFis...

Unless usefulness is determined by the location remaining static after many (500?) samples, and treated as reliable for positioning.

Even many sample won't work reliably enough, there are a lot of semi-stationary BT devices like speakers (at one place for longer times, but might move after several hours / days). I think this discussion (and related research) should be done in a separate issue.

deal with WiFi scan throttling

However, handling of cases where users have disabled the misfeature in Developer Options, Déjà Vu shouldn't assume that it may only scan every NN seconds. Though, I'm not sure how it should detect that it may scan more often.

a) in API 29 or 30 there is WifiManager.IsScanThrottleEnabled for checking the setting (but we still don't know which scan result is new and which is old with changed timestamps) b) In my version I have some initial attempt of throttling detection: Simply compare new scan results with previous results. If we have the same list of WiFis with the same signal strength, we can assume throttling is enabled. In my experience, signal strength fluctuates a bit between scans and thus is a reasonably reliable way of detecting throttling. But this may give wrong detection if there are only few WiFis nearby, and detected signal strength really is the same.

Again, this should go into a separate issue.

An obvious exception being when it starts receiving detection reports of several emitters it's never seen before; then perhaps increase the rate. My thinking being; entering a previously-unscanned area, so be sure to capture the data.

Unfortunately this is not so simple: new emitters will only be added if we have a GPS location. At home my phone finds a lot of WiFis that are not in the database, and they never will be in the database because there is no GPS signal. And I don't want to have increased scan rate at home...

allow overriding some detections (e.g. wifiManager erroneously reports my phone isn't capable of 5 GHz WiFi)

I think it would be better to encourage such users to report the problem so that the detection may be fixed in code, for the benefit of all users, which will keep Déjà Vu configurationless.

Google or device manufacturers aren't going to fix this, and we don't have the means to change a broken WifiManager

But in my current build this is used only for performance optimizations, so such detection is not really necessary.

@Helium314 how feasible would it be to do as gdt suggests of bug-fixes first, and a fresh release, before making deeper changes? Is Kotlin a preference, or are you uncomfortable (not confident) with Java?

I'm definitely much less confident with java than with kotlin. I started conversion because I was fed up with spending time researching how to do the java version of some kotlin stuff. But the (in my opinion) absolutely necessary changes are simple enough that I can do them in java as well. And I agree that doing changes in functionality while switching language is a bad idea.

I would really be inclined to just have a new release with outright fixes

I'm strongly inclined to agree with the one-step-at-a-time philosophy of gdt. Lots of reasons, most from experience.

Reason for what appears to be so much at once from my side: I've had most of them running for ~2 years already, and decided a few weeks ago to re-do my changes in a less chaotic way and publish them in case anyone was still interested in DejaVu. (the new emitter types and database format were recent changes, inspired by the database work I did for StreetComplete)

My only caveat is that any changes ty schema, or DB type, should absolutely not lose user-data (by which I mean, here, already-collected emitters).

I think this is pretty clear, given the lack of import/export capability. In my version DB upgrade works, although a bit slow (ca 1-2 minutes for a 50 MB database on my S4 Mini)

Lee-Carre commented 2 years ago

In my opinion the blacklist fix and the fixes for bad location reports are quite important fixes.

Being that they're fundamental to the primary purpose of the backend, yes 👍. Plus, they're already developed 🙂. Little point to not merge them back into this repo.

They can always be altered later, if folks see fit.

[…] why I did the related changes. (Actually I've been using them for the last ~2 years)

Already well-tested, then.

While @gdt's points (re priorities) are valid (especially for planning future work), @Helium314 has already done the development.

How about Helium314's already-developed changes be merged, and we go from there? At the very least, they clearly don't break anything (including compatibility). The history will still exist for future reference, should it be needed as a guide to refine any refactoring.

That way, the handful of us are all on the same page re future development.

Using kotlin was, as written above, because I feel much more familiar with it. I do not want to look up how to do something in java when I know how to do it in kotlin. If it's not just for me personally, I'm ok with java as well.

It does seem that kotlin is becoming more popular. Java is quite old, now.

My point here, is in thinking of other developers contributing; they're more likely to when the language is already familiar to them.

I can create a bunch of issues, one for each problem I found, and mention what I used for solution / workaround.

That would be excellent for many reasons:

Is it really worth using a different library? I had a look at spatialite a few months ago, and had the impression the Android version doesn't get much maintainance. Would there be any uses of spatialite other than the spatial indexing using R-tree? I recently compared SQLite with R-tree vs 'normal' indices for a different project, and found that for points, and there was almost no difference between latitude/longitude index and R-tree. If we want to use bounding boxes instead of center and radii, R-tree / spatialite will be more useful.

Sounds like SQLite is the safer long-term choice.

Using something more niche (and less maintained), sounds like a case of premature optimisation.

Actually it's one line for creating index on latitude/longitude, no further changes needed to make use of it. And the performance advantage is significant.

You mentioned that your own DB is ~50MiB; how many emitters / entries is that (an estimate will do) else how long a period have you been collecting over?

While low-latency position fixes are indeed nice-to-have, they're not an absolute requirement. I notice that other backends take a progressive approach; returning a coarse fix quickly, then refining it. If apps need high accuracy from the start, then they can modify the parameters with which they query the provider so that only fixes of sufficient accuracy are returned.

For optimisation, it would seem to come down to which code paths (function calls, or whatever) are most common, and starting there.

If returning a moderate-accuracy fix takes a second or two, that's no worse than GNSS (doing a cold-start, even with A-GPS). I'm mindful that latency is most notable when actually testing the backend, because then one is waiting for it at that moment. Yet, the real use-case for testing is when using positioning / navigation apps (with GNSS disabled).

Right, I had started to put together changes I thought necessary, plus some performance improvements. The plan was simply releasing an apk on github and rarely / never touching the code again.

Understandable; many things start as personal hobbies.

I'd personally favour a more collaborate approach, if others are also willing.


If an emitter is blacklisted and has a coverage (which is the bounding box), it is removed from the database.

👍 then my point is moot 🙂.

Currently I don't know how to find a device's Bluetooth class (for range estimation), and how to separate Bluetooth beacons from other devices.

Some possibly-relevant repos of projects which capture / process 802.14:

[…] there are a lot of semi-stationary BT devices like speakers (at one place for longer times, but might move after several hours / days).

Good point.

I think this discussion (and related research) should be done in a separate issue.

Definitely. See #40.

[…] there is WifiManager.IsScanThrottleEnabled for checking the setting

Simpler than I thought 🙂👍.

(but we still don't know which scan result is new and which is old with changed timestamps) b) In my version I have some initial attempt of throttling detection: Simply compare new scan results with previous results. If we have the same list of WiFis with the same signal strength, we can assume throttling is enabled. In my experience, signal strength fluctuates a bit between scans and thus is a reasonably reliable way of detecting throttling. But this may give wrong detection if there are only few WiFis nearby, and detected signal strength really is the same.

Interesting.

What about querying the GNSS provider? Else, setting the query parameters such that it supplies fixes only after moving some minimum distance (1 metre?)?

Again, this should go into a separate issue.

Indeed 👍.

[…] new emitters will only be added if we have a GPS location. At home my phone finds a lot of WiFis that are not in the database, and they never will be in the database because there is no GPS signal. And I don't want to have increased scan rate at home…

Well, of course; I meant only when GNSS is enabled.

As I've stated elsewhere; I like that n76's backends use only the passive provider, since that enables a degree of control over when (& where) capturing occurs.

allow overriding some detections (e.g. wifiManager erroneously reports my phone isn't capable of 5 GHz WiFi)

I think it would be better to encourage such users to report the problem so that the detection may be fixed in code, for the benefit of all users, which will keep Déjà Vu configurationless.

Google or device manufacturers aren't going to fix this, and we don't have the means to change a broken WifiManager

I don't disagree with those statements, literally. However, would it not be feasible to have Déjà Vu itself handle the workaround:

While I entirely relate to not holding one's breath that mega-corps will ever fix reported bugs, I still feel that they should be reported anyway. No excuse, then. In the meantime, workarounds apply.

But in my current build this is used only for performance optimizations, so such detection is not really necessary.

I'm definitely much less confident with java than with kotlin. I started conversion because I was fed up with spending time researching how to do the java version of some kotlin stuff. But the (in my opinion) absolutely necessary changes are simple enough that I can do them in java as well. And I agree that doing changes in functionality while switching language is a bad idea.

Besides keeping bug-count low (by using a familiar language), I'm also seeing the significant point about development time (for a given change; fixing or implementing something).

Though, yes, varying multiple factors (language & functionality) is to invite bugs. If there were no other considerations, then I might suggest the first set of PRs be Java-based bug fixes (like @gdt urged), with kotlin refactoring to come next / later.

However, your changes have already been implemented, and well-tested at that.

Another take on @gdt's point about productive changes; merging already-working fixes is a whole lot more productive than trying to backport them to Java (to probably replace that with kotlin in future, anyway) just for the sake of properness (or the appearance thereof).

The history will always exist, for comparison, should glitches be found in future.

I get the impression that Kotlin is more efficient (at least to author code, if not in resource-usage to execute that code). That many developers like it must be for some sane(?) reason.

While I, as a non-programmer, can't really say much re Java versus Kotlin as languages; from the little I do know I'm not likely to prioritise learning Java. Instead, other (more modern) languages would come first. Java would be more of an if-I-really-need-to case (probably exactly like Helium314, here).

Reason for what appears to be so much at once from my side: I've had most of them running for ~2 years already, and decided a few weeks ago to re-do my changes in a less chaotic way and publish them in case anyone was still interested in DejaVu.

Firstly, thanks for-publishing at all, rather than keeping them private.

Secondly; I think part of what's at play, here, is the foibles of human psychology. Software development does seem to test / stretch the limits of human cognition. While Helium314 is intimately familiar with (& confident in) His changes, they're new to the rest of us.

So, how about a staged / gradual process of narrow-scope PRs, starting with the basics, until all of Helium314's improvements are part of this repo. (Then we can squabble about what to do next 😄.)

That way, there's opportunity to review & test, incrementally, each set of changes, for us newbies to become familiar. To eventually all catch up with where Helium314 is at, and be on the same page to discuss future improvements from the same (up-to-date) starting-point.

From the sound of it, it's largely bug fixes and other obvious refinements. The likes of new emitters are still to-be-researched.

(the new emitter types and database format were recent changes, inspired by the database work I did for StreetComplete)

Oh? As a user & (GH) follower of SC, I'm most curious.

My only caveat is that any changes ty schema, or DB type, should absolutely not lose user-data (by which I mean, here, already-collected emitters).

I think this is pretty clear, given the lack of import/export capability.

One would hope / think so, but I've seen otherwise experienced developers do some dubious / questionable / daft things, which had unintended consequences.

An example which comes to mind involves Mozilla Firefox trashing a whole bunch of user-data (for many users), due to well-meaning changes.

In my version DB upgrade works, although a bit slow (ca 1-2 minutes for a 50 MB database on my S4 Mini)

As a one-off, that sounds fine. A clear remark in the release notes (that this will happen, and to be patient) should suffice.

Though, I'm assuming that said upgrade uses safety measures (backing up the original DB, and otherwise being able to auto-rollback in case of failure).

Helium314 commented 2 years ago

Already well-tested, then.

While @gdt's points (re priorities) are valid (especially for planning future work), @Helium314 has already done the development.

How about Helium314's already-developed changes be merged, and we go from there? At the very least, they clearly don't break anything (including compatibility). The history will still exist for future reference, should it be needed as a guide to refine any refactoring.

I'll create issues about these things anyway, for documentation and maybe there are better solutions. There's always the problem that some things work on one device, and don't work on others... so in worst case I may have changed some part DejaVu to work for myself only.

Sounds like SQLite is the safer long-term choice.

Using something more niche (and less maintained), sounds like a case of premature optimisation.

I would prefer if we kept using built-in SQLite version, but if there are clear performance advantages that's a good reason for switching library (or at least investigating)

You mentioned that your own DB is ~50MiB; how many emitters / entries is that (an estimate will do) else how long a period have you been collecting over?

About 5 years, but before I started using StreetComplete 2 years ago (and thus using GPS a lot in built-up areas), DB was something like 6 MiB.

Oh? As a user & (GH) follower of SC, I'm most curious.

see https://github.com/streetcomplete/StreetComplete/issues/3609 and PR https://github.com/streetcomplete/StreetComplete/pull/3741

I only replied to a few points, we can copy most of the others to the issues (to be created).

Lee-Carre commented 2 years ago

maybe there are better solutions

If not, then an analysis (via discussion) of the trade-offs is still worthwhile.

A 15% performance increase, but which increases the memory-footprint 25%, may be attractive to some, but not others.

There's always the problem that some things work on one device, and don't work on others… so in worst case I may have changed some part DejaVu to work for myself only.

Possibly. Only one way to find out 😉.

If there were not yet any forked changes (i.e. we were just discussing ideas), then part of gdt's concerns may have been around this concept: change a little at a time, test throughly before changing more.

But, I don't imagine there's an easy way to do that, now (with how far your fork diverges from this repo; I notice for example that you've restructured the blacklisting code significantly). Hence suggesting that we just bite the bullet and get on with it (to expose any problems), instead of fretting about what's already done (or expecting re-implementation).

The lesson in future is; publish (one's fork) early & often. Submit PRs upstream liberally, from distinct branches. Best of both, then.

We can deal with proper project management next, though.

I would prefer if we kept using built-in SQLite version, but if there are clear performance advantages that's a good reason for switching library (or at least investigating)

Mirrors my own thoughts. SQLite regular preserves flexibility. It's also more familiar (for those wanting to tinker).

If I understood your earlier points correctly, then SQLite can always be adapted to mimic other DBMSs / schemas, anyway.

The juice must be worth the squeeze.

About 5 years, but before I started using StreetComplete 2 years ago (and thus using GPS a lot in built-up areas), DB was something like 6 MiB.

Good to know. Thanks.

Been doing surveying longer than I've used UnifiedNLP (which is relatively recent). Though, I'm on a small island anyway, so probably don't have to worry about DB bloat. wifi_backend tells me that I have (only) ~2.5k BSSIDs detected so far.

streetcomplete/StreetComplete#3609 and PR streetcomplete/StreetComplete#3741

👍

copy most of the others to the issues

Yup, I'll see to that. Noticed that you've opened several new issues with relevant titles. Probably do it by the end of the week(end).