openstates / openstates-scrapers

source for Open States scrapers
https://openstates.org
GNU General Public License v3.0
846 stars 464 forks source link

NJ failing since at least 2019-03-03 #2904

Closed openstates-bot closed 5 years ago

openstates-bot commented 5 years ago

NJ has been failing since 2019-03-03

Based on automated runs it appears that NJ has not run successfully in 2 days (2019-03-03).

    File "/usr/lib/python3.6/urllib/request.py", line 526, in open
    response = self._open(req, data)
  File "/usr/lib/python3.6/urllib/request.py", line 544, in _open
    '_open', req)
  File "/usr/lib/python3.6/urllib/request.py", line 504, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.6/urllib/request.py", line 1552, in ftp_open
    raise exc.with_traceback(sys.exc_info()[2])
  File "/usr/lib/python3.6/urllib/request.py", line 1541, in ftp_open
    fp, retrlen = fw.retrfile(file, type)
  File "/usr/lib/python3.6/urllib/request.py", line 2421, in retrfile
    raise URLError('ftp error: %r' % reason) from reason
urllib.error.URLError: <urlopen error ftp error: URLError("ftp error: error_perm('550 The system cannot find the file specified. ',)",)>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/opt/**PGUSER**/venv-pupa//bin/pupa", line 11, in <module>
    load_entry_point('pupa', 'console_scripts', 'pupa')()
  File "/opt/**PGUSER**/venv-pupa/src/pupa/pupa/cli/__main__.py", line 68, in main
    subcommands[args.subcommand].handle(args, other)
  File "/opt/**PGUSER**/venv-pupa/src/pupa/pupa/cli/commands/update.py", line 278, in handle
    return self.do_handle(args, other, juris)
  File "/opt/**PGUSER**/venv-pupa/src/pupa/pupa/cli/commands/update.py", line 327, in do_handle
    report['scrape'] = self.do_scrape(juris, args, scrapers)
  File "/opt/**PGUSER**/venv-pupa/src/pupa/pupa/cli/commands/update.py", line 175, in do_scrape
    report[scraper_name] = scraper.do_scrape(**scrape_args)
  File "/opt/**PGUSER**/venv-pupa/src/pupa/pupa/scrape/base.py", line 112, in do_scrape
    for obj in self.scrape(**kwargs) or []:
  File "/opt/**PGUSER**/**PGUSER**/**PGUSER**/nj/bills.py", line 225, in scrape
    self._init_mdb(year_abr)
  File "/opt/**PGUSER**/**PGUSER**/**PGUSER**/nj/utils.py", line 31, in _init_mdb
    fname, resp = self.urlretrieve(url)
  File "/opt/**PGUSER**/venv-pupa/lib/python3.6/site-packages/scrapelib/__init__.py", line 321, in urlretrieve
    result = self.request(method, url, data=body, **kwargs)
  File "/opt/**PGUSER**/venv-pupa/lib/python3.6/site-packages/scrapelib/__init__.py", line 286, in request
    **kwargs)
  File "/opt/**PGUSER**/venv-pupa/lib/python3.6/site-packages/scrapelib/cache.py", line 66, in request
    resp = super(CachingSession, self).request(method, url, **kwargs)
  File "/opt/**PGUSER**/venv-pupa/lib/python3.6/site-packages/scrapelib/__init__.py", line 88, in request
    return super(ThrottledSession, self).request(method, url, **kwargs)
  File "/opt/**PGUSER**/venv-pupa/lib/python3.6/site-packages/scrapelib/__init__.py", line 182, in request
    raise exception_raised
  File "/opt/**PGUSER**/venv-pupa/lib/python3.6/site-packages/scrapelib/__init__.py", line 153, in request
    resp = super(RetrySession, self).request(method, url, **kwargs)
  File "/opt/**PGUSER**/venv-pupa/lib/python3.6/site-packages/requests/sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)
  File "/opt/**PGUSER**/venv-pupa/lib/python3.6/site-packages/requests/sessions.py", line 646, in send
    r = adapter.send(request, **kwargs)
  File "/opt/**PGUSER**/venv-pupa/lib/python3.6/site-packages/scrapelib/__init__.py", line 122, in send
    raise FTPError(request.url)
scrapelib.FTPError: error while retrieving ftp://www.njleg.state.nj.us/ag/2018data/DB2018.zip

Visit http://bobsled.openstates.org for more info.

showerst commented 5 years ago

Not sure what's going on here; works fine when i run it locally.

estaub commented 5 years ago

Original problem appears to be gone; current problem is:

pupa.exceptions.DuplicateItemError: attempt to import data that would conflict with data already in the import: {'identifier': '', 'motion_text': 'AMEND', 'motion_classification': ['passage'], 'start_date': '2019-01-31T05:00:00+00:00', 'result': 'pass', 'extras': {}, 'legislative_session_id': UUID('4cbea970-85ea-454f-9b32-a88bb3a2be4b'), 'organization_id': 'ocd-organization/0461b3ba-ecad-4ed3-bc89-056dbaa698d1', 'bill_id': 'ocd-bill/8f5e3268-caf3-4832-a924-3496aeffd416'} (already imported as AMEND on S 10 in New Jersey 2018-2019 Regular Session)

estaub commented 5 years ago

NJ appears to have restructured their FTP repository.

estaub commented 5 years ago

The DuplicateItemError problem continues, on an AMEND vote on 31 Jan 2019 on two different bills: S10 and S2262. I cannot reproduce it, in the sense that if I do a scrape, I only see one vote on that date (31 Jan 2019) for each bill.