The user table stores "auto creation" events for non-attached users in enwiki. The logging table provides more reliable counts of new registration on enwiki.
From D. Taraborelli:
after a little bit of research, the logging table has some extra log_action types that may explain the inflated counts in users when grouped by user_registration:
1) create2: this is for proxy-registered users, we had 49 such users on 2012-09-05
2) autocreate: this is for locally reserved user_ids generated on another wiki, we had a whopping 1947 events logged with this log_action on 2012-09-05
The sum of create, create2 and autocreate produces a total (6357) that is much closer to the figure from the user table (6437) but there are still 80 users missing.
I think it's safe to use WHERE log_action = 'create' AND log_type='newusers' as a condition to identify genuine on-wiki registrations.
https://github.com/rfaulkner/E3_analysis/blob/6c17247a9885379aea4f034e6e41c8cea88a4b65/src/metrics/threshold.py
The user table stores "auto creation" events for non-attached users in enwiki. The logging table provides more reliable counts of new registration on enwiki.
From D. Taraborelli:
after a little bit of research, the logging table has some extra log_action types that may explain the inflated counts in users when grouped by user_registration:
1) create2: this is for proxy-registered users, we had 49 such users on 2012-09-05 2) autocreate: this is for locally reserved user_ids generated on another wiki, we had a whopping 1947 events logged with this log_action on 2012-09-05
The sum of create, create2 and autocreate produces a total (6357) that is much closer to the figure from the user table (6437) but there are still 80 users missing.
I think it's safe to use WHERE log_action = 'create' AND log_type='newusers' as a condition to identify genuine on-wiki registrations.