palewire / django-calaccess-campaign-browser

A Django app to refine, review and republish campaign finance data drawn from the California Secretary of State’s CAL-ACCESS database
http://django-calaccess-campaign-browser.californiacivicdata.org
MIT License
17 stars 12 forks source link

Error loading contributions #92

Closed aboutaaron closed 10 years ago

aboutaaron commented 10 years ago

Truncated stacktrace

...
-- Outputing CSV dump sorted by unique identifier
Traceback (most recent call last):
...
  File "calaccess_campaign_browser/management/commands/buildcalaccesscampaignbrowser.py", line 13, in handle
    call_command("loadcalaccesscampaigncontributions")
  File "/home/aaron/.envs/campaign_finance/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 115, in call_command
    return klass.execute(*args, **defaults)
  File "/home/aaron/.envs/campaign_finance/local/lib/python2.7/site-packages/django/core/management/base.py", line 338, in execute
    output = self.handle(*args, **options)
  File "calaccess_campaign_browser/management/commands/loadcalaccesscampaigncontributions.py", line 22, in handle
    self.transform_csv()
  File "calaccess_campaign_browser/management/commands/loadcalaccesscampaigncontributions.py", line 105, in transform_csv
    c.execute(sql)
  File "/home/aaron/.envs/campaign_finance/local/lib/python2.7/site-packages/django/db/backends/utils.py", line 65, in execute
    return self.cursor.execute(sql, params)
  File "/home/aaron/.envs/campaign_finance/local/lib/python2.7/site-packages/django/db/utils.py", line 94, in __exit__
    six.reraise(dj_exc_type, dj_exc_value, traceback)
  File "/home/aaron/.envs/campaign_finance/local/lib/python2.7/site-packages/django/db/backends/utils.py", line 63, in execute
    return self.cursor.execute(sql)
  File "/home/aaron/.envs/campaign_finance/local/lib/python2.7/site-packages/django/db/backends/mysql/base.py", line 128, in execute
    return self.cursor.execute(query, args)
  File "/home/aaron/.envs/campaign_finance/local/lib/python2.7/site-packages/MySQLdb/cursors.py", line 205, in execute
    self.errorhandler(self, exc, value)
  File "/home/aaron/.envs/campaign_finance/local/lib/python2.7/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
    raise errorclass, errorvalue
django.db.utils.InternalError: (3, "Error writing file '/tmp/tmpeFKtVr' (Errcode: 28)")

I ran into a similar error before, but it looks like I had DEBUG set to True. I changed that and ran this again, but now got this error. Gonna do some research, but I thought I'd file a ticket in the meantime.

palewire commented 10 years ago

Thanks for the bug report.

I recently did a big rewrite of the contributions loader to try to mark the duplicates before we put them into the database.

Our SQL-based refactor at the convening didn't mark them at all, and my attempt to mark them in the same manner as the filings load (with a big JOIN and UPDATE in SQL) took 12 hours to finish on my laptop.

So what I've tried to do is dump out a CSV from the raw table into the temporary directory, transform that from one CSV to another, marking the dupes as we go, and then loading it back in. It worked on my tmp dir, but maybe I did it wrong.

The reason I'm dumping them out that way is that a simple sort by filing_id_raw and amend_id seems to put the "real" record first in each grouping of amendments and then we're able to mark them with a simple operation on loop and don't have to load things into memory, build dicts, etc. It's not super fast, but it beats the alternatives I've come up with so far.

There has to be a better way to do this, especially if it's crashing on other platforms.

aboutaaron commented 10 years ago

Well in that case, it looks like it's trying to make a directory in /tmp/ at the root level and perhaps that's why I get an error. Would it be worth putting the tmp folder in the user's home directory or should /tmp be accessible to the user? Not sure if the permissions on my ubuntu box are borked.

palewire commented 10 years ago

I wrongly assumed that the Python tempfile library was going to handle all that automatically. We probably have to go back to the drawing board on this a bit.

aboutaaron commented 10 years ago

Looking like this is actually a disk size error. Returned error: django.db.utils.InternalError: (3, "Error writing file '/tmp/tmpeFKtVr' (Errcode: 28)")

$ perror 28
OS error code  28:  No space left on device
aboutaaron commented 10 years ago

Perhaps may be worth documenting common errors like this to mitigate tickets. Maybe a "Common issues/errors/solutions" section in the docs

aboutaaron commented 10 years ago

Can confirm this was just a storage and table issue. Everything is running peachy as of last commit. Closing for now.

palewire commented 10 years ago

Thanks. Though there has to be better way to do this in The Next Great Refactor in the Sky.