opencivicdata / scrapers-us-municipal

Scrapers for US municipal governments.
MIT License
100 stars 66 forks source link

Bill scrape imports duplicate bill subjects #244

Open reginafcompton opened 6 years ago

reginafcompton commented 6 years ago

Recently, Chicago Legistar had a bill with duplicate indexes:

screen shot 2018-09-26 at 9 18 00 am

The scraper imported these duplicates (as seen in our OCD API):

screen shot 2018-09-26 at 9 19 53 am

Our Councilmatic database could not import the bill, due to an Integrity error (i.e., trying to import the same Subject more than once).

We could approach this a couple ways:

(1) The scraper should fail when trying to add duplicate subjects to a Bill.

-OR-

(2) The scraper should not fail, but it should not be able to create duplicate subjects (maybe by making Bill.subject a set, rather than list, in pupa? or checking if a subject exists before appending it?)

fgregg commented 6 years ago

Let's fix it in the scraper.

reginafcompton commented 6 years ago

Once we do this, we'll need to tend to Chicago Councilmatic, since https://ocd.datamade.us/ocd-bill/8efd9d9c-3397-4b5a-a7cc-fc935e9eb10a/ was never imported to the site.