Closed mhl closed 8 years ago
I'm afraid the CSV tests are being broken by the addition of the twitter ID.
Modulo the failing tests and the error message issue this looks good.
We have a similar problem in EveryPolitician where we need to map Twitter handles to ids. @mhl Are there any obvious steps that we could ticket to inch this towards being a more general solution that we could take advantage of in other projects such as EP?
@chrismytton Perhaps a more generically useful of version of this would be an API for a service which:
(2 might be quite fun / interesting for some people.) Is that the kind of thing you meant?
On the other hand, using the Twitter API for this is really quite easy anyway, so I'm not sure how worthwhile it would be.
One other point about this PR is that it just replaces the old Twitter screen name contact detail, but since ContactDetail
objects can have a start and end date, we could keep old ContactDetail
s but set their start_date / end_date appropriately.
(This would mean some more widespread code changes, though, so I'm inclined to think that should be a new ticket, to see if people like the idea.)
It strikes me that a more EP way of dealing with this is just to hand something a CSV of IDs and twitter names and then have it update that CSV with twitter IDs and then call a webhook to fetch it back and then you can ingest the list however you want.
This all looks good with the fixups.
OK, clearly I'm not used to the EveryPolitician mindset. Here's my attempt at working out how this would work in practice:
You have an twitter-id-data
repository, with a structure like:
index.csv
data/
If you want to add a set of screen names / user IDs to be tracked, you'd make a pull request to twitter-id-data
which:
index.csv
which includes a string identifying your project (uk-candidates
, say)data/uk-candidates.csv
which contains at minimum two columns: twitter_screen_name
and twitter_user_id
. You can put all the known screen names and user IDs you have already in either of those columns.We'd periodically run a script which collects all current values from the user_id
and screen_name
columns in any CSV file under data
and finds their corresponding screen name / user ID / from the Twitter API. For each of those, it updates (adding if necessary) first_found_valid
and last_found valid
columns with timestamps, filling in any missing values, or creates a new row with if one of the values in the mapping has changed. (A user ID or screen name that was never found would be left with that single value in its row.) This will build up a history of the mappings between screen name <-> user ID, with first_found_valid
and last_found_valid
. It could also create a file which just has the current mappings called data/current/uk-candidates.csv
. If any of the screen name <--> user ID mappings have changed on this run of the script, it would fire any webhooks registered for the project name.
@struan is it possible to run this management command against the results of the 2015 General Election? We still use the data from that within EP and a lot of the twitter handles are now stale, which is causing us some problems in merging with data from other sources…
@tmtmtmtm You'd need to ask Sym as I don't have access to the DC server.
This pull request adds:
twitter_user_id
column to CSV output.The management command (
candidates_update_twitter_usernames
) could reasonably be run once a day. It finds many Twitter screen names currently associated with people that don't exist, and will print them to standard output, so we should get an email if any such cases appear when this command is run from cron. (We should go through these by hand probably.)The above changes depend on adding a new configuration option to conf/general,yml, which is
TWITTER_APP_ONLY_BEARER_TOKEN
Fixes #271