Closed gordonje closed 7 years ago
Checked the server log. Since the last update on Sunday, there have been about 15 update attempts, most of which are throwing this traceback error:
Traceback (most recent call last):
File "/apps/calaccess/repo/manage.py", line 35, in <module>
execute_from_command_line(sys.argv)
File "/apps/calaccess/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 367, in execute_from_command_line
utility.execute()
File "/apps/calaccess/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 359, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/apps/calaccess/local/lib/python2.7/site-packages/django/core/management/base.py", line 305, in run_from_argv
self.execute(*args, **cmd_options)
File "/apps/calaccess/local/lib/python2.7/site-packages/django/core/management/base.py", line 356, in execute
output = self.handle(*args, **options)
File "/apps/calaccess/repo/calaccess_website/management/commands/updatedownloadswebsite.py", line 35, in handle
super(Command, self).handle(*args, **options)
File "/apps/calaccess/local/lib/python2.7/site-packages/calaccess_raw/management/commands/updatecalaccessrawdata.py", line 124, in handle
download_metadata = self.get_download_metadata()
File "/apps/calaccess/local/lib/python2.7/site-packages/calaccess_raw/management/commands/__init__.py", line 47, in get_download_metadata
last_modified = request.headers['last-modified']
File "/apps/calaccess/local/lib/python2.7/site-packages/requests/structures.py", line 54, in __getitem__
return self._store[key.lower()][1]
KeyError: 'last-modified'
These appear to be cases where the response to our HEAD
request does not include a Last-Modified
value.
In the remaining cases, the Last-Modified
and Content-Length
values were identical to what we had on Sunday.
Will keep this issue open until the regular updates start coming in.
Also might want to catch and log the status code of the head response. That would be a change in the raw-data app.
logger sounds like a great idea either way
Still no update today. I'm now getting 504 (Gateway Time-out) errors.
Seems like the SoS IT people have resolved the issue on their end. On Friday (24/Mar/2017 23:45:02
server time) our downloads-website ec2 instance logged a new version of CAL-ACCESS. I also just got an email from David Walker in the SoS office, stating that this has been resolved.
However, our website builds are still behind. The process is failing during the cleancalaccessrawfile
command. Here's the traceback:
Traceback (most recent call last):
File "/apps/calaccess/repo/manage.py", line 35, in <module>
execute_from_command_line(sys.argv)
File "/apps/calaccess/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 367, in execute_from_command_line
utility.execute()
File "/apps/calaccess/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 359, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/apps/calaccess/local/lib/python2.7/site-packages/django/core/management/base.py", line 294, in run_from_argv
self.execute(*args, **cmd_options)
File "/apps/calaccess/local/lib/python2.7/site-packages/django/core/management/base.py", line 345, in execute
output = self.handle(*args, **options)
File "/apps/calaccess/repo/calaccess_website/management/commands/updatedownloadswebsite.py", line 35, in handle
super(Command, self).handle(*args, **options)
File "/apps/calaccess/local/lib/python2.7/site-packages/calaccess_raw/management/commands/updatecalaccessrawdata.py", line 308, in handle
self.clean()
File "/apps/calaccess/local/lib/python2.7/site-packages/calaccess_raw/management/commands/updatecalaccessrawdata.py", line 385, in clean
keep_file=self.keep_files,
File "/apps/calaccess/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 113, in call_command
command = load_command_class(app_name, command_name)
File "/apps/calaccess/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 40, in load_command_class
module = import_module('%s.management.commands.%s' % (app_name, name))
File "/usr/lib/python2.7/importlib/__init__.py", line 37, in import_module
__import__(name)
File "/apps/calaccess/local/lib/python2.7/site-packages/calaccess_raw/management/commands/cleancalaccessrawfile.py", line 15, in <module>
from csvkit import reader, writer
File "/apps/calaccess/local/lib/python2.7/site-packages/csvkit/__init__.py", line 15, in <module>
import agate
File "/apps/calaccess/local/lib/python2.7/site-packages/agate/__init__.py", line 5, in <module>
from agate.aggregations import *
File "/apps/calaccess/local/lib/python2.7/site-packages/agate/aggregations/__init__.py", line 20, in <module>
from agate.aggregations.all import All # noqa
File "/apps/calaccess/local/lib/python2.7/site-packages/agate/aggregations/all.py", line 4, in <module>
from agate.data_types import Boolean
File "/apps/calaccess/local/lib/python2.7/site-packages/agate/data_types/__init__.py", line 14, in <module>
from agate.data_types.date import Date # noqa
File "/apps/calaccess/local/lib/python2.7/site-packages/agate/data_types/date.py", line 5, in <module>
import isodate
So we are importing csvkit
which is importing agate
which is importing isodate
which is not found. This is probably something I screwed up the last time I deployed the website when updating to the raw-data django app to the latest version.
I tried running pip install isodate
on the server, and got this error:
Downloading/unpacking isodate
Downloading isodate-0.5.4.tar.gz
Cleaning up...
setuptools must be installed to install from a source distribution
Storing debug log for failure in /home/ccdc/.pip/pip.log
The traceback in the log file says:
Traceback (most recent call last):
File "/apps/calaccess/local/lib/python2.7/site-packages/pip/basecommand.py", line 122, in main
status = self.run(options, args)
File "/apps/calaccess/local/lib/python2.7/site-packages/pip/commands/install.py", line 278, in run
requirement_set.prepare_files(finder, force_root_egg_info=self.bundle, bundle=self.bundle)
File "/apps/calaccess/local/lib/python2.7/site-packages/pip/req.py", line 1229, in prepare_files
req_to_install.run_egg_info()
File "/apps/calaccess/local/lib/python2.7/site-packages/pip/req.py", line 292, in run_egg_info
logger.notify('Running setup.py (path:%s) egg_info for package %s' % (self.setup_py, self.name))
File "/apps/calaccess/local/lib/python2.7/site-packages/pip/req.py", line 269, in setup_py
"setuptools must be installed to install from a source "
InstallationError: setuptools must be installed to install from a source distribution
It's at this point I decide that it's time to upgrade pip:
$ pip install -U pip
Downloading/unpacking pip from https://pypi.python.org/packages/b6/ac/7015eb97dc749283ffdec1c3a88ddb8ae03b8fad0f0e611408f196358da3/pip-9.0.1-py2.py3-none-any.whl#md5=297dbd16ef53bcef0447d245815f5144
Downloading pip-9.0.1-py2.py3-none-any.whl (1.3MB): 1.3MB downloaded
Installing collected packages: pip
Found existing installation: pip 1.5.4
Uninstalling pip:
Successfully uninstalled pip
Successfully installed pip
Cleaning up...
But then I get a different error when I try pip install isodate
again:
Collecting isodate
/apps/calaccess/local/lib/python2.7/site-packages/pip/_vendor/requests/packages/urllib3/util/ssl_.py:318: SNIMissingWarning: An HTTPS request has been made, but the SNI (Subject Name Indication) extension to TLS is not available on this platform. This may cause the server to present an incorrect TLS certificate, which can cause validation failures. You can upgrade to a newer version of Python to solve this. For more information, see https://urllib3.readthedocs.io/en/latest/security.html#snimissingwarning.
SNIMissingWarning
/apps/calaccess/local/lib/python2.7/site-packages/pip/_vendor/requests/packages/urllib3/util/ssl_.py:122: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. You can upgrade to a newer version of Python to solve this. For more information, see https://urllib3.readthedocs.io/en/latest/security.html#insecureplatformwarning.
InsecurePlatformWarning
Downloading isodate-0.5.4.tar.gz
Could not import setuptools which is required to install from a source distribution.
Traceback (most recent call last):
File "/apps/calaccess/local/lib/python2.7/site-packages/pip/req/req_install.py", line 387, in setup_py
import setuptools # noqa
File "/apps/calaccess/local/lib/python2.7/site-packages/setuptools/__init__.py", line 12, in <module>
import setuptools.version
File "/apps/calaccess/local/lib/python2.7/site-packages/setuptools/version.py", line 1, in <module>
import pkg_resources
File "/apps/calaccess/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 72, in <module>
import packaging.requirements
File "/apps/calaccess/local/lib/python2.7/site-packages/packaging/requirements.py", line 9, in <module>
from pyparsing import stringStart, stringEnd, originalTextFor, ParseException
ImportError: No module named pyparsing
/apps/calaccess/local/lib/python2.7/site-packages/pip/_vendor/requests/packages/urllib3/util/ssl_.py:122: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. You can upgrade to a newer version of Python to solve this. For more information, see https://urllib3.readthedocs.io/en/latest/security.html#insecureplatformwarning.
InsecurePlatformWarning
We are currently running Python 2.7.6
on our server, so I followed the above suggestion and update to the latest version: 2.7.13. These instructions seemed suitable enough. Though, they've lead me to create a separate virtualenv. Will need to go back and incorporate some of these updates into our chef recipes.
Still unpacking all of this. Here's another interesting tidbit: It appears as though 'Last-modified'
in the header can be off by about a half minute:
In [1]: import requests
In [2]: url = 'http://campaignfinance.cdn.sos.ca.gov/dbwebexport.zip'
In [3]: r = requests.head(url)
In [4]: r.headers['Last-modified']
Out[4]: 'Tue, 28 Mar 2017 11:20:55 GMT'
In [5]: r = requests.head(url)
In [6]: r.headers['Last-modified']
Out[6]: 'Tue, 28 Mar 2017 11:20:28 GMT'
Which is causing our download/update process to treat these as separate releases. Might need to replace the logic that compares the exact values of 'Last-modified'
to check instead if they are within a minute of each other (or thereabouts).
As of this morning July 5, 2017, the CAL-ACCESS bulk download has not updated in five days since June 30, 2017.
$ date
Wed Jul 5 12:07:58 PDT 2017
$ curl -I HEAD http://campaignfinance.cdn.sos.ca.gov/dbwebexport.zip
HTTP/1.1 200 OK
Server: Apache/2.2.3 (Red Hat)
Last-Modified: Fri, 30 Jun 2017 11:20:28 GMT
ETag: "2320c8-305b5d54-9ab7f700"
Accept-Ranges: bytes
Content-Length: 811294036
Content-Type: application/zip
Date: Wed, 05 Jul 2017 19:08:01 GMT
Connection: keep-alive
Despite assurances otherwise from the Secretary of State office, as of 7 AM this morning the raw data download still has not updated.
>>> import requests
>>> url = 'http://campaignfinance.cdn.sos.ca.gov/dbwebexport.zip'
>>> r = requests.head(url)
>>> r.headers['Last-modified']
'Fri, 30 Jun 2017 11:20:28 GMT'
Looks like this was fixed.
Just noticed that our website has not been updated since Sunday.
Looking into the log on the server, but it appears as though new snapshots are not being released.