Closed matkoniecz closed 4 years ago
This has been suggested multiple times in the other issue, but obviously nobody took any notice of it: my suggestion would also be to use the https://wiki.openstreetmap.org/wiki/API_v0.6#Preferences_of_the_logged-in_user service, like overpass turbo does, and you're done. The only thing you can do is to cheat on yourself, which is kind of pointless anyway. This information is visible for your own user, regardless from where you log on.
So there's really no need for any local server, or no additions to the API ("Perhaps a new attribute could be added on the side of the OSM Api developers," is not going to happen, sorry).
I've seen the comment above "but it is not good idea for reasons not listed because I have again written a book in an issue", but I don't think the alternatives are really feasible.
Thanks for a feedback!
So there's really no need for any local server
That would be a great news, I will look at this competing solution again (and end writing the next chapter of the book).
I've seen the comment above "but it is not good idea for reasons not listed because I have again written a book in an issue"
tl;dr: storing data in user preferences means that there is no reasonable way to handle data corruption and it is tricky to support data already lost by users reinstalling application.
There are following problems:
Looking at problems again:
(uid + edit_count) * user_count
may be not so prohibitive? How many SC users are there? How much app download size can be wasted on this?Overall, main problem is that it is basically using external database without admin access to it. There is no way to initially populate it and there is no way to debug/fix it once things start failing.
And it is mostly caused by not storing preference data there but by attempting to store there data that would not be (easily) recoverable once lost.
And things like "is there a week with at least one quest solved on each day" would be basically impossible to do while easy to add to a database.
The only thing you can do is to cheat on yourself, which is kind of pointless anyway
As long as no public leaderboards exist it should be no problem at all. And public leaderboards are, at least in my opinion, not worth the trouble given type of encouraged behavior. And anyway one may just parse changeset planet file to get them if someone really wants that.
I also find the solution of a small backend (php script + a few tables) the best, it is also the most flexible, as the solved quests could be sorted by country/city, the user's "streak" (consecutive mapping days) could be shown, or "indirect" leaderboards like "you are among the top 10 contributors (for quest type WX (in YZ))".
small backend (php script + a few tables)
This seems feasible in general and I expect to be a able to implement it (especially given that with this architecture it should be possible to recover from data corruption), but I am stuck at "how access to server with statistics would be authenticated".
Making this API open has a separate set of issues that may be potentially problematic and I have no good idea how to handle authentication in any reasonable way.
Requiring users to create separate password for StreetComplete is not ok. And passing OpenStreetMap access token to server seems to be also a very bad idea. Or maybe it would be ok? Or is there some other way to establish user-specific secret? Leaving this API public has its own issues.
storing data in user preferences means that there is no reasonable way to handle data corruption and it is tricky to support data already lost by users reinstalling application.
No, you could fix this from any Javascript page, which has been authenticated against osm.org, and simply read and update the user preferences. It's nothing really magical, just a key value store. The only downside I see is that it's open to all apps, so other evil apps could silently change/delete some keys. Maybe it would be an idea to extend this concept to include the OAuth app, so different apps are isolated from each other (this would require some changes to the API which could be relevant to other apps as well).
Data lost by users: yes, that's not covered. I don't know how much of a problem that is.
each device has its own preference key for each saved preference
Depends on how you use the key value store. If you have one key for SC (per user), they would all share the same value.
There is no way to initially populate it and there is no way to debug/fix it once things start failing.
Once you roll out this change, you would simply take the current counter and publish it in the user preferences. I don't see the debug/fix part as an issue, overpass turbo works just fine with that kind of storage.
but by attempting to store there data that would not be (easily) recoverable once lost.
You can fetch the same via a simple HTTP call as authenticated user, same it updating and deleting.
Maybe you need to do a multi step approach anyway, starting small by keeping the current star count on osm.org, and later add something more sophisticated on your own server.
No, you could fix this from any Javascript page, which has been authenticated against osm.org, and simply read and update the user preferences. It's nothing really magical, just a key value store.
Each user can do this relatively easily. But app author is unable to fix it directly. For example bug in app set this value to 0 for all users that used broken version. Now what?
Evil apps are probably not a problem, it is just number - not something that is valuable for others or even prominent enough to make interesting to vandalize it.
Once you roll out this change, you would simply take the current counter and publish it in the user preferences.
Current star count may not include count from a previous device/installation.
For example bug in app set this value to 0 for all users that used broken version. Now what?
That's pretty much the same as today, in case the app is broken in some way and messes with the local counter. How do you handle this as of now?
Current star count may not include count from a previous device/installation.
One option to retrieve this kind of information would be by processing all changeset metadata (preferably as a dump + regular updates), and analyze relevant changesets (also preferably using a dump + regular updates), both of which require a dedicated server. The server has to publish the star count via some API for the app to fetch. That's the "more sophisticated" thing I mentioned before.
Another option: maybe you could talk to the OSMCha folks, if you could use their API for your purposes? They provide filters based on username, period of time, editor, and return a number of create/modify/delete operations per changeset. I think that could be a good starting point for your stats.
Requiring users to create separate password for StreetComplete is not ok. And passing OpenStreetMap access token to server seems to be also a very bad idea. Or maybe it would be ok? Or is there some other way to establish user-specific secret? Leaving this API public has its own issues.
The changeset history is part of the public API that does not require authentication. So, why authenticate in the first place? Also, weren't your scripts based on digging through the planet file with history on a daily/weekly (or so) basis rather than talk with the OSM API?
Yes, it aggregates public data but
Overall, I think that it would be OK to make this public - but I am not sure is it a consensus and GDPR may obligate us to do this anyway.
The following wiki page summaries all changes that are planned for GDPR compliance: https://wiki.openstreetmap.org/wiki/GDPR/Affected_Services
I don’t recall where changes to planet or diff files are documented. I think there was some plan to require an additional log on.
There were some blog posts to find people implementing those changes.
User display name, id and changeset are typically hidden as an anonymous user.
Besides hdyc OSMCha also requires a logged on user before any changeset is shown.
So, the part of getting the data should be fine, because we could have a logged in streetcomplete user to retrieve the data that is not public (in the future). (There is actually already one.) It will however be required to delete the data associated with a user when that user decided to delete his account on openstreetmap.org. If I remember correctly, a list of deleted user ids is made public somewhere, so the backend needs to check every now and then if it has data of deleted users and if yes, delete this data.
On the part of giving out that data, on the client side, StreetComplete will of course only start showing the data once the user is successfully logged in. This is of course only a data protection through the client. On the backend side, a simple measure would be to only allow access if the user agent is StreetComplete.
This measure of course is ineffective from a data security point of view, but neither is the (planned) measure to restrict access to changeset information to only logged-in OSM users nor the measure of HDYC: Any person with a little technical background will be able to circumvent it to get the data anyway.
But I believe data security not to be the point - if someone deliberately circumvents such a measure, he is aware that access is not allowed and that he is potentially in breach of the GDPR.
The other, stronger measure, is to present each user with another OAuth login screen when entering the statistics screen, but this one identifies as a different app, i.e. "StreetComplete Stats" and requires actually no permissions at all from the user but only takes the token to authenticate the user - similarly as HDYC does it. But I do not believe this is necessary.
For complete overkill, without any additional logins, app may create public+private key, save private as user preference (globally), create changeset containing public key as one of tags and call the SC server with pointer to that changeset.
API would be able to encrypt responses directed to that user with public key and make it usable only to holder of private key (this specific user).
But hopefully
But I believe data security not to be the point - if he deliberately circumvents such a measure, he is aware that access is not allowed and that he is potentially in breach of the GDPR.
is good enough for this purpose, this data will be anyway trivial to reconstruct from history and based solely on past edits.
Such overengineering would be hilariously pointless as script computing this data will be released as open source. So anyone will be able to anyway run this script (or write their own) and compute all this statistics anyway.
Pretty cool authentication idea though. "OpenOsmChangesetId" (analogous to OpenID), heh
Overpass api has a similar requirement to only hand out data for a logged on osm.org user - without osm.org having to know when and which query was executed.
There’s a proposal to generate a token on osm.org which can later be presented to Overpass api. Once a valid token is presented to the server, a user will have full access to metadata.
Downside of it is that it’s not yet available on production.
https://github.com/openstreetmap/openstreetmap-website/pull/2145
That's pretty much the same as today, in case the app is broken in some way and messes with the local counter. How do you handle this as of now?
Release the new version that will not have this bug. It will retrieve new data from the statistics server with statistics, and replace its older data.
Similar thing could be done with client-only data storage in preferences. But after update instructing to throw away cache it would require all StreetComplete users to redownload past statistics using OSM API.
With central server after complete data corruption it would be possible to generate this data from history or changeset dump or use other smart solution. With client-only storage the solution would require downloading all this using OSM API.
The plan for now is to rely on changeset metadata only and make simple server publishing total star count and
Find all changesets by user, paginate through it because only 100 are shown at a time, for all changesets that are created_by StreetComplete, download the whole changeset (ugh...) and count the number of unique elements affected.
With exception that rather downloading full changeset it would use "changes_count" attribute
For example https://api.openstreetmap.org/api/0.6/changesets?user=1722488
Yes, I noticed the new "changes_count" attribute (see https://api.openstreetmap.org/api/0.6/changeset/70163079), but how are reverted changes counted?
https://api.openstreetmap.org/api/0.6/changeset/79195526 - modifying and reverting edits in a single changeset counts as two edits, the same as in SC #1537
In case of deciding that #1537 should be implemented (it is WONTFIX at this moment), it would be possible to add some sort of metadata to changesets to count/mark undoes.
Implementing this, using https://wiki.openstreetmap.org/wiki/API_v0.6#Query:_GET_.2Fapi.2F0.6.2Fchangesets :
changes_count
)Step 1: fetch data using API and create a local database with table that has following fields
With such table we should be able to answer questions about statistics.
To coordinate fetching data it is necessary to have closed
status for changesets to list one where edit count may become different and have for each user info about
DATE_OF_BIRTH = new GregorianCalendar(2017,Calendar.FEBRUARY,20).getTime()
)Initial implementation will do easiest possible thing
initial download:
For given user download sequentially changeset data, using https://api.openstreetmap.org/api/0.6/changesets?user=1722488&time=$BIRTH_DATE_OF_SC,$LOWER_RANGE_BOUNDARY
Earliest created_at date of downloaded ones is the new latest date that still may have earlier unfetched changesets. Upper date boundary updated to max of its value and latest changeset. Fetched data should be stuffed into the database (only SC changesets). Repeat, until earliest date of possible new changesets is greater than date ).
Data update:
@matkoniecz What's the status? I am almost done with the UI part
Mateusz is currently busy with other things. Is anyone interested in doing it, maybe someone who created a backend before - @ENT8R or @exploide ? @matkoniecz already did some important considerations in the above comments. Otherwise I will get to it as soon as I am done with the frontend part.
Hi, I think I can't do this at the moment, sorry. If that changes soon, I will let you know. But if I should look into some detail just ping me.
Same for me... I also don't have that much time currently 😞
Okay, thanks for the answers. Then I'll get to it next. I will do another post here once I start working on it.
I'm starting to work on it now
Almost done: https://github.com/westnordost/sc-statistics-service
@ENT8R , @exploide , @matkoniecz would you review it and open issues on the issue tracker there if you find something?
What is missing is the "index.php" with which to get the data and trigger the collection of the data as well as the cronjob that updates the data. However, these scripts will be rather short because all the logic is in the classes.
Hmm while testing the PHP implementation I am getting doubts as to whether it makes sense at all to have a backend for this rather than calculate it directly on the device.
Looking through my whole changeset history took less than 6 seconds. I made 3,259 changesets. So that's about 600ms for each batch of 100 changesets. So even looking through the massive changeset history of a user like @matkoniecz (32,333 changesets) takes about a minute - once.
Even though I am almost finished with the PHP implementation, I should take a step back first and contrast the two options:
Advantages of local implementation:
Though, since the implementation in PHP is already done now, this is not that much of a good reason.
Advantages of backend:
Non-advantages of backend:
TODO2: what more advantages would a deep analysis of the changesets bring?
Regardless, I am done with the implementation in PHP+MySQL for anyone wanting to review it. Thanks so far, @exploide
I think ignoring quests which were solved >250 meters are not possible in both approaches. Or do I miss something?
So reinstalling can be considered a hack for boosting your star count.
Not sure what you mean.
On 18 April 2020 08:35:57 CEST, Holger Jeromin notifications@github.com wrote:
I think ignoring quests which were solved >250 meters are not possible in both approaches. Or do I miss something?
So reinstalling can be considered a hack for boosting your star count.
The app counts stars. If I am 1000 Meters away from the quest location I can solve the quest, but my star count will not rise (+0 stars).
When I change my phone this new feature will restore the star count. But the 1000 meter away quest will be +1 on the new device. And not +0 as on the old phone.
If I am 1000 Meters away from the quest location I can solve the quest, but my star count will not rise (+0 stars).
This is not correct. Your star count should rise, it is a solved quest as any other.
Okay, I added that reverted solved quests are not counted for the star-count as well as split-ways are only counted once by doing a deep analysis of the changesets. Additionally, I added a geocoder so that the changes can now be associated to countries. Here is an example output for a user:
{
"questTypes": {
"AddAccessibleForPedestrians": 24,
"AddAddressStreet": 2,
"AddBenchBackrest": 62,
"AddBikeParkingCapacity": 9,
"AddBikeParkingCover": 78,
"AddBikeParkingType": 3,
"AddBridgeStructure": 5,
"AddBuildingLevels": 199,
"AddBuildingType": 581,
"AddBusStopName": 1,
"AddBusStopShelter": 59,
"AddCarWashType": 3,
"AddCrossingType": 146,
"AddCycleway": 41,
"AddCyclewaySegregation": 2,
"AddFireHydrantType": 1,
"AddForestLeafType": 1,
"AddHandrail": 2,
"AddHousenumber": 26,
"AddMaxHeight": 13,
"AddMaxSpeed": 87,
"AddMaxWeight": 2,
"AddOneway": 1,
"AddOpeningHours": 88,
"AddParkingAccess": 40,
"AddParkingFee": 18,
"AddParkingType": 35,
"AddPathSurface": 171,
"AddPlaceName": 20,
"AddPlaygroundAccess": 26,
"AddPostboxCollectionTimes": 1,
"AddProhibitedForPedestrians": 20,
"AddRailwayCrossingBarrier": 33,
"AddRecyclingContainerMaterials": 1,
"AddRecyclingType": 1,
"AddReligionToPlaceOfWorship": 9,
"AddReligionToWaysideShrine": 8,
"AddRoadName": 172,
"AddRoadSurface": 682,
"AddRoofShape": 61,
"AddSegregated": 2,
"AddSidewalk": 21,
"AddSport": 4,
"AddTactilePavingBusStop": 10,
"AddTactilePavingCrosswalk": 161,
"AddToiletsFee": 2,
"AddTracktype": 1,
"AddTrafficSignalsButton": 4,
"AddTrafficSignalsSound": 3,
"AddVegetarian": 3,
"AddWayLit": 822,
"AddWheelchairAccessBusiness": 8,
"AddWheelchairAccessDogPark": 1,
"AddWheelchairAccessToilets": 2,
"DetailPavedRoadSurface": 1,
"IsBuildingUnderground": 1,
"MarkCompletedBuildingConstruction": 6,
"MarkCompletedConstruction": 3,
"MarkCompletedHighwayConstruction": 31
},
"countries": {
"AT": 1,
"CN-XZ": 3,
"CY": 66,
"CZ": 11,
"HR": 1,
"IL": 10,
"IT": 39,
"MA": 3,
"PL": 3678,
"SK": 8
},
"daysActive": 174,
"lastUpdate": "2020-03-22T12:55:14+00:00"
}
I plan to deploy it this evening or tomorrow, last chance to review the code before I destroy my webspace by deploying an insecure PHP implementation ;-) @Akasch @matkoniecz @ENT8R
@exploide found some caveats that are PHP specific of which I am thankful because it is PHP knowledge I lack. Also, @exploide , maybe you would like to review what I added this weekend?
Also, @matkoniecz , do you have a list of all user ids of streetcomplete users? If I populate the database with data before it is queried, users will get their statistics right away without waiting.
I took a look at the changes and I think it is fine.
In get_statistics.php there is still the GDPR TODO concerning user-agent "protection".
A minor comment concerning mysqli_report
in this file: For public facing web endpoints, one usually wants to disable SQL error reporting completely.
In get_statistics.php there is still the GDPR TODO concerning user-agent "protection".
Jup I know, will do that at the very end.
A minor comment concerning mysqli_report in this file: For public facing web endpoints, one usually wants to disable SQL error reporting completely.
Thanks for the hint!
Thanks! Now I am finally feeling home on the phone I got 7 month ago. :-) (6271)
Congratulation, I only have 2003
1991 here, about to catch you
…but really, you have many more, since mine (and many others) would not exist without this app :)
Question to answer: how access to server with statistics would be authenticated?
Is it better to have central server or a serverless version (almost certainly - server, though it is necessary to answer the authentication question)?
Is there something that can be improved at specification stage? Is there something that should be handled and is missing?
It was previously reported as #188 - I opened new issue as it is the first step toward actually coding this feature.
Currently star count is the only statistic available and it is stored locally. As result changing phone or reinstalling application resets star count.
It would be nice to share star count between devices.
It is desirable to get and synchronize also other statistics, not only total star count. It would be nice to have info how many quests of each type were solved or star count limited to some specific area. This would allow properly adding badges/achievements based on this new statistics that would be shared across devices.
This data can be retrieved from OSM changeset history.
Serverless version
It can be done without a central server, but...
The problem is that it would be
Central server
An alternative is to introduce a central server. Individual phones would call it to get computed info.
but how access to it would be authenticated? Requiring users to create separate password for StreetComplete is not ok. And passing OpenStreetMap access token to server seems to be also a very bad idea. Or maybe it would be ok? Or is there some other way to establish user-specific secret? Leaving this API public has its own issues.
loading data into server - simplest version would call OSM API to get info, but it would be nice to reduce number of this calls. Full processing of planet history + minutely updates may be unfeasible to do. But https://github.com/Zverik/editor-stats/issues/4 indicates that some forms of planet processing is relatively easy. In this case getting all edits done in a specific editor up to latest weekly update.
I think that it may be a good way to speed-up data generation and reduce calls to OSM API.
Specification of the server
It would be written in PHP (due to inability to run other software on available hosting). It would use curl to communicate with OSM API (curl is confirmed to be available). There is also https://www.php.net/manual/en/function.file-get-contents.php (fopen is enabled).
Following was copied from email by @westnordost and published with a permission
How calling OSM API would work (the same applies to a potential serverless version):
Note: changeset in returned list are "ordered by creation date" https://wiki.openstreetmap.org/wiki/API_v0.6
Simple case of revert in changeset (not spread over separate changesets): https://api.openstreetmap.org/api/0.6/changeset/72331586/download
API calls for the new server
I think that responses should also include "as of" date.
I am thinking that storing more raw data may be preferable.
1) SC changesets: changeset id (primary key), user id, quest type, star count earned in it, bbox 2) users: user id (primary key), last updated date (or last processed changesets)
I think that with such format it will be easier debug what went wrong in case of data corruption, it will be easier to add new queries. For example "is there a week with at least one quest solved on each day" (regular repeated use may be a better metric than just edit count).
Not sure whatever it will be useful to add indexes/tables with data served as responses.
There also should be a database on a phone - with reserved stars and date when this happened. This way it will be possible to merge this data with responses "user had $X stars as of $DATE"