Open mhl opened 10 years ago
There are three scrapedxml entries for what is that day's debate on the site - 8984.xml, 8985.xml and 8997.xml. I'm not sure of the process of having multiple XML entries - the correction from 8997 is present, but I think 8984 must have been loaded on top maybe? It is 8997.xml that is on the official site (and changing the source link from 8984 to 8997 shows that it does then work).
8984 => 8985 corrects a speaker's name; 8985 => 8997 has a full correction.
Another example this week - http://www.theyworkforyou.com/sp/?id=2014-08-19.12.0 - 9493 and 9497, with 9497 having the small correction but not being loaded in to the site.
So xml2db.pl
is, from a surface reading, meant to spot, in db_addpair
, if the same GID is used twice when parsing a day's worth of XML, so I wondered why it wasn't erroring here - however, db_addpair sets $ignorehistorygids{$gid} = 1
and all the functions that call db_addpair just move on if that is set. So duplicate GIDs will always be ignored, not errored on, the error handling can never be reached, and the first GID found is used (hence why the later XML IDs aren't being imported). ignorehistorygids
appears to be for redirects of old GIDs, so I can't quite see why db_addpair adds it too, but presumably there was a reason.
The links to the original source are broken on this page http://www.theyworkforyou.com/sp/?id=2014-03-04.7.0