Open mhl opened 7 years ago
https://code.google.com/archive/p/hansard/downloads contains database dumps under reference_data. TWFY has its old import code for this in scripts/historic
(in TWFY repo). I can probably do better with more time, but hopefully that's enough for this.
... yep, people.json has Diane Abbott historichansard_person_id of 7, and commons_library_data/people.sql has her under key 7.
For reference, the Historic Hansard code is also available on GitHub https://github.com/millbanksystems/hansard and the running site is no longer a Rails app, it's (effectively) a flat file backup + a Sinatra app to replicate the original search functionality (um, https://github.com/lizconlan/hh-search-app I think, I should probably transfer that to the correct ownership)
I have a full database backup somewhere...
"n.b. some people in parlparse have the ID scheme historichansard_person_id and some have historichansard_id - I'm assuming they're the same ID space, but maybe not" - no, as with us, one is a person ID, one is a membership ID.
Thanks, @dracos and @lizconlan - that's brilliant.
I'm frustrated by this, because I think one of the first things I did when working for mySociety was working with @frabcus on importing people from the historic Hansard data but I can't remember enough of the detail to be able to answer my own question!
The Wikidata project has imported all the historic MPs from the historic Hansard records from http://hansard.millbanksystems.com/ using the slugs on people pages as IDs - this is Wikidata property P2015. parlparse, however, uses IDs for historic MPs with the scheme
historichansard_id
which is numeric. If we could find the mapping between these two ID spaces, that would able us to straightforwardly associate everyone in parlparse with the right Wikidata items, which would be brilliant.The problem is that I can't find any use of the
historichansard_id
values on http://hansard.millbanksystems.com/ at all now. It's not in the source of people pages or debate pages on that site. The credits page links to the XML data that site is based on: http://www.hansard-archive.parliament.uk/ but those don't appear to have IDs associated with members at all - the<member> ... </member>
tags have no attributes, and I can't see any other element that has them. (This is all worth double-checking, I should say!)Can anyone help with figuring this out? Is it possible that we used a different structured data source from those XML files when importing the historic MPs into parlparse, and I'm just not finding it now? (Looking through the history of this repository, I can't even see what script might have been used for the import now, though I imagine we did commit it.)
If the
historichansard_id
s were the database primary keys for the Rails site hosted here: http://hansard.millbanksystems.com/ (source code here: https://code.google.com/archive/p/hansard/downloads ) then perhaps we could get a dump of that mapping from the maintainers?To help with checking this kind of thing, an example:
mrs-margaret-thatcher
(page here: http://hansard.millbanksystems.com/people/mrs-margaret-thatcher/ )historichansard_person_id
of 5962n.b. some people in parlparse have the ID scheme
historichansard_person_id
and some havehistorichansard_id
- I'm assuming they're the same ID space, but maybe not.Cc: @dracos @crowbot