palewire / django-calaccess-campaign-browser

A Django app to refine, review and republish campaign finance data drawn from the California Secretary of State’s CAL-ACCESS database
http://django-calaccess-campaign-browser.californiacivicdata.org
MIT License
17 stars 12 forks source link

Expenditures-to-CSV query refers to missing field #225

Closed myersjustinc closed 9 years ago

myersjustinc commented 9 years ago

The buildcalaccesscampaignbrowser management command fails partway through:

$ python example/manage.py buildcalaccesscampaignbrowser
Flushing CAL-ACCESS campaign browser database tables
 Filer
 Filing
 Summary
 Cycle
 Committee
 Contribution
 Expenditure
 Election
 Office
 Candidate
 Proposition
 PropositionFiler
Loading filers and committees
 Dropping temporary tables
 Creating temporary tables
 Loading cycles
 Loading candidate filers
 Creating more temporary tables
 Loading candidate committees
 Creating yet more temporary tables
 Loading PAC filers
 Loading PAC committees
 Dropping temporary tables
Loading filings
 Loading form 450, 460, 497 filings
 Marking duplicates
Loading summary totals
 Transforming source CSV
  Regrouping
  Writing to filesystem
 Loading transformed CSV
Loading contributions
 Quarterly filings
  Marking duplicates
   Dumping CSV sorted by unique identifier
   Marking duplicates in a new CSV
  Loading CSV
  Merging CSV data with other tables
 Late filings
  Marking duplicates
   Dumping CSV sorted by unique identifier
   Marking duplicates in a new CSV
  Loading CSV
  Merging CSV data with other tables
Loading expenditures
 Quarterly filings
  Marking duplicates
   Dumping CSV sorted by unique identifier
Traceback (most recent call last):
  File "example/manage.py", line 15, in <module>
    execute_from_command_line(sys.argv)
  File "/Users/jcm/.virtualenvs/django-calaccess-campaign-browser-py2/lib/python2.7/site-packages/django/core/management/__init__.py", line 385, in execute_from_command_line
    utility.execute()
  File "/Users/jcm/.virtualenvs/django-calaccess-campaign-browser-py2/lib/python2.7/site-packages/django/core/management/__init__.py", line 377, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/Users/jcm/.virtualenvs/django-calaccess-campaign-browser-py2/lib/python2.7/site-packages/django/core/management/base.py", line 288, in run_from_argv
    self.execute(*args, **options.__dict__)
  File "/Users/jcm/.virtualenvs/django-calaccess-campaign-browser-py2/lib/python2.7/site-packages/django/core/management/base.py", line 338, in execute
    output = self.handle(*args, **options)
  File "calaccess_campaign_browser/management/commands/buildcalaccesscampaignbrowser.py", line 14, in handle
    call_command("loadcalaccesscampaignexpenditures")
  File "/Users/jcm/.virtualenvs/django-calaccess-campaign-browser-py2/lib/python2.7/site-packages/django/core/management/__init__.py", line 115, in call_command
    return klass.execute(*args, **defaults)
  File "/Users/jcm/.virtualenvs/django-calaccess-campaign-browser-py2/lib/python2.7/site-packages/django/core/management/base.py", line 338, in execute
    output = self.handle(*args, **options)
  File "calaccess_campaign_browser/management/commands/loadcalaccesscampaignexpenditures.py", line 457, in handle
    self.transform_quarterly_expenditures_csv()
  File "calaccess_campaign_browser/management/commands/loadcalaccesscampaignexpenditures.py", line 127, in transform_quarterly_expenditures_csv
    self.cursor.execute(sql)
  File "/Users/jcm/.virtualenvs/django-calaccess-campaign-browser-py2/lib/python2.7/site-packages/django/db/backends/utils.py", line 81, in execute
    return super(CursorDebugWrapper, self).execute(sql, params)
  File "/Users/jcm/.virtualenvs/django-calaccess-campaign-browser-py2/lib/python2.7/site-packages/django/db/backends/utils.py", line 65, in execute
    return self.cursor.execute(sql, params)
  File "/Users/jcm/.virtualenvs/django-calaccess-campaign-browser-py2/lib/python2.7/site-packages/django/db/utils.py", line 94, in __exit__
    six.reraise(dj_exc_type, dj_exc_value, traceback)
  File "/Users/jcm/.virtualenvs/django-calaccess-campaign-browser-py2/lib/python2.7/site-packages/django/db/backends/utils.py", line 63, in execute
    return self.cursor.execute(sql)
  File "/Users/jcm/.virtualenvs/django-calaccess-campaign-browser-py2/lib/python2.7/site-packages/django/db/backends/mysql/base.py", line 128, in execute
    return self.cursor.execute(query, args)
  File "/Users/jcm/.virtualenvs/django-calaccess-campaign-browser-py2/lib/python2.7/site-packages/MySQLdb/cursors.py", line 205, in execute
    self.errorhandler(self, exc, value)
  File "/Users/jcm/.virtualenvs/django-calaccess-campaign-browser-py2/lib/python2.7/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
    raise errorclass, errorvalue
django.db.utils.OperationalError: (1054, "Unknown column 'payee_adr1' in 'field list'")

That traceback points to this query, which refers to a field that used to exist but no longer does.

That field disappeared in california-civic-data-coalition/django-calaccess-raw-data@8e6749932b711c8ce3921a00f27cf4d1fa82d4bc, and this query seems not to have been updated to match. Not entirely sure I follow what was going on with the django-postgres-copy changes (especially since I'm using MySQL for this), so I haven't attempted to put together a fix—but figured I should flag it anyway.

(Noticed this while I was working on #149, which I've otherwise made good progress on...)

palewire commented 9 years ago

Thanks for this bug report. I believe you've put your finger directly on the problem.

During the implementation of the improved PostgreSQL loader I discovered that a smattering of our raw models contained fields that were in the official documentation but absent from the actual data dump. Our previous loaders just skipped over them.

My suspicion is that the state has chosen to withhold these fields from release. Since they tend to be fields related to addresses, my guess is that it the redaction due to a (questionable) interpretation of privacy protections. Sometime soon I'd like to do a comprehensive study of all fields withheld from the data dump but I haven't made the time yet.

Solving your bug in the short term I believe will require that we subtract from the campaign browser loader's custom SQL all of the the fields deleted from the raw app.

This is unintended consequence of my earlier work and I apologize for the pain. I'll try to get to a patch later today.

palewire commented 9 years ago

Alright. I took a blind stab at removing two of the retired fields. Haven't had time to test but it's a start.

palewire commented 9 years ago

Alright. I think I fixed this. After pulling the latest code, the expenditure loader now works for me.

myersjustinc commented 9 years ago

Yep, works for me, too. Thanks for that fix!