opencivicdata / scrapers-us-municipal

Scrapers for US municipal governments.
MIT License
97 stars 66 forks source link

Contemplate strategy for duplicate bills #196

Open reginafcompton opened 6 years ago

reginafcompton commented 6 years ago

Recently, a Chicago bill changed its identifier (i.e., its MatterFile) from "CL 2017-966" to "`CL 2017-966". The scraper interpreted this as a new bill and thus created one with a different OCD ID, despite all other elements remaining constant.

Do we want to consider adding a mechanism to clean unusual characters from the identifier?

hancush commented 6 years ago

FWIW, this seems to be in play here as well: https://github.com/datamade/nyc-council-councilmatic/issues/87

hancush commented 6 years ago

And perhaps here: https://github.com/datamade/chi-councilmatic/issues/227

hancush commented 6 years ago

Leaving these links here for posterity, though they may, in fact, be more related to https://github.com/opencivicdata/pupa/issues/295