Open bkowshik opened 7 years ago
Found 2604
changesets that had the geojson version of it in real changesets. Assuming all features in changesets with revert
in the changeset comment are correcting a harmful change, I get changeset IDs of the previous version of all features in these changesets. Ex:
Reverting 49662828 by Demo15_15
Following this workflow, I find a list of 14,062
unique changesets. Ideally, this is a list of changesets that had a problematic feature which was later reverted. The next step was to see what percentage of this was recent, (say in 2017) and have real changesets version so that we can use it as part of the training/validation dataset in Gabbar.
10,691
(76%) potentially problematic changesets that have real changeset.@manoharuss @krishnanammala, need your help here. Can you randomly :eyes: about 100 changesets from this list to see what percentage of the 100 are problematic. This will help us understand what to expect and if this can be used as training dataset in Gabbar.
cc: @planemad
@bkowshik any changeset reverted by an experienced editor (>100edits) we can safely say was definitely a bad one. Lets use our time time more wisely to review only those that were reverted by a inexperienced user (<20 edits), this is where we might find some false negatives.
Other highly valuable questions to answer here:
cc @maning @batpad
Thank you @planemad, that was super helpful!
12257
changesets, only 2604
(21%) have real changesets.468
mappers.2493
(95%) reverting changesets were by users with 100 or more changesets21
(1%) reverting changesets were from users with less than 20 changesetsThe CSV with 21
reverting changesets by new users is at the link below:
Yes, there is a correlation between the experience of the user and number of reverting changesets. Reverting changesets are way more likely from experienced users than new users.
What is more interesting is that user_mapping_days
has a stronger correlation at 0.6
to number of reverting changesets in comparison to user_changesets
with a correlation of 0.3
. So, the mapping days of the user is a stronger indicator.
I couldn't resist finding who's changeset were getting reverted - the other side of the story.
14062
changesets, only 10693
(76%) have real changesets1301
mappers with one or more reverted changesets8679
(61%) reverted changesets were of users with 100 or more changesets1161
(8%) reverted changesets were of users with less than 20 changesets60%
of his/her changesets reverted, 4106
reverted changesetsThe number of a users changesets getting reverted comes down as the user has more changesets, the user gains more mapping experience.
As expected, the user mapping days is negatively correlated, -0.3
. Thus, higher a users mapping days, less likely of changeset being reverted.
Per https://github.com/mapbox/gabbar/issues/66#issuecomment-310029426
There are 21
reverting changesets by users less than 20
changesets. @manoharuss @krishnanammala can you please 👀 these and post notes about what percentage of this 21
are actually problematic?
cc: @planemad
Per text with @batpad,
Changeset comment has
revert
There are a total of
13,125
changesets on osmcha withrevert
in the changeset comment. Interestingly,2,505
(20%) changesets are one feature modification changesets which is what we use in the latest version of Gabbar.revert
in changeset commentAssuming, mappers revert a problematic or wrong feature in these one feature modification changesets, this could be an additional dataset we could make use of for the current iteration of the feature level classifier of Gabbar. I manually :eyes: a couple of these changesets and they are definitely want we want to catch with Gabbar.
Changesets from revert user accounts
Mappers and DWG sometimes maintain a separate account for reverts. Changesets from these accounts will be interesting to look at as well. Ex:
cc: @anandthakker @geohacker