Look up data sources, loaders by ocd_id in addition to state

ghing commented 10 years ago

A friend of a friend wanted help scraping election data for Cook County Illinois. We ultimately wrote it in Ruby because she was a novice programmer and that's what she knew, but the general pattern of defining paths to data, parsing and storing it was the same. This got me thinking, "what would it take to use openelex-core as a framework for writing scrapers for arbitrary jurisdictions?"

One issue with our framework is that it's oriented around U.S. states. So, when we want to fetch results, we run a command like:

inv fetch --state ia

internally, the fetch task looks up the datasource for the state like this:

    state_mod = load_module(state, ['datasource', 'fetch'])
    datasrc = state_mod.datasource.Datasource()

Imagine if we added something like a jurisdiction option for the invoke tasks. Then we could hypothetically do something like this:

inv fetch --jurisdiction "ocd-division/country:us/state:il/county:cook"

The logic would be fairly similar inside the task:

if state:
    jurisdiction = 'ocd-division/country:us/state:{}'.format(state.lower())

jurisdiction_mod = load_module(jurisdiction, ['datasource', 'fetch'])

To make this work, we'd have to develop some kind of registration pattern to map between ocd_ids and Python modules, but this is definitely doable. I could imagine a pretty simple approach where we use our existing logic for discovering states and just add a setting for "contrib modules" that would look something like this:

OPENELEX_JURISDICTION_MODULES = {
   'ocd-division/country:us/state:il/county:cook': 'scrape_cook_county'
}

In this example, scrape_cook_county would be a totally separate package that implements the API that we have defined for states (datasource.py, load.py, etc).

dwillis commented 10 years ago

+1 to this.

zstumgoren commented 10 years ago

+1

openelections / openelections-core

Look up data sources, loaders by ocd_id in addition to state #213