Open onyxfish opened 15 years ago
This has now been documented in the Database Planning section of the wiki: http://wiki.github.com/bouvard/votersdaily/database-planning
Fixed for Python scrapers. This is def. a much better way of identifying each document.
fixed closing.
It looks like the scrapers are still pulling in branch and entity names in the format: [datetime] - [parser_name] - [branch] - [entity] - [unique key]. Now that we are including parser name I think we should remove [branch] and [entity]. They really only make the id's longer and I'm already a bit concerned that some of our URL's are going to be overly lengthy.
Also, for the Roll Call Votes scrapers where there is a unique Vote Number, I really think we want to use that as the [unique key] portion rather than the title.
Going to reopen this ticket, pending discussion.
will work on this week.
Where unique keys is whatever is appropriate to a given scraper. For Roll Call Vote scrapers this would be Roll #. For some scrapers this may be title--whatever makes a given event unique.