Closed aboutaaron closed 10 years ago
Thanks for the bug report.
I recently did a big rewrite of the contributions loader to try to mark the duplicates before we put them into the database.
Our SQL-based refactor at the convening didn't mark them at all, and my attempt to mark them in the same manner as the filings load (with a big JOIN and UPDATE in SQL) took 12 hours to finish on my laptop.
So what I've tried to do is dump out a CSV from the raw table into the temporary directory, transform that from one CSV to another, marking the dupes as we go, and then loading it back in. It worked on my tmp dir, but maybe I did it wrong.
The reason I'm dumping them out that way is that a simple sort by filing_id_raw and amend_id seems to put the "real" record first in each grouping of amendments and then we're able to mark them with a simple operation on loop and don't have to load things into memory, build dicts, etc. It's not super fast, but it beats the alternatives I've come up with so far.
There has to be a better way to do this, especially if it's crashing on other platforms.
Well in that case, it looks like it's trying to make a directory in /tmp/
at the root level and perhaps that's why I get an error. Would it be worth putting the tmp
folder in the user's home directory or should /tmp
be accessible to the user? Not sure if the permissions on my ubuntu box are borked.
I wrongly assumed that the Python tempfile
library was going to handle all that automatically. We probably have to go back to the drawing board on this a bit.
Looking like this is actually a disk size error.
Returned error: django.db.utils.InternalError: (3, "Error writing file '/tmp/tmpeFKtVr' (Errcode: 28)")
$ perror 28
OS error code 28: No space left on device
Perhaps may be worth documenting common errors like this to mitigate tickets. Maybe a "Common issues/errors/solutions" section in the docs
Can confirm this was just a storage and table issue. Everything is running peachy as of last commit. Closing for now.
Thanks. Though there has to be better way to do this in The Next Great Refactor in the Sky.
Truncated stacktrace
I ran into a similar error before, but it looks like I had
DEBUG
set toTrue
. I changed that and ran this again, but now got this error. Gonna do some research, but I thought I'd file a ticket in the meantime.