openstates / openstates-scrapers

source for Open States scrapers
https://openstates.org
GNU General Public License v3.0
846 stars 462 forks source link

PR legislators scrape disabled #1611

Closed openstates-bot closed 6 years ago

openstates-bot commented 7 years ago

PR has been failing since 2017-04-01

Based on automated runs it appears that PR has not run successfully in 5 days (2017-04-01).

  04:01:25 INFO billy: billy-update abbr=pr
    actions=scrape,import,report
    types=bills,legislators
    sessions=2017-2020
    terms=2017-2020
04:01:26 INFO billy: Scraping PR upper chamber.
04:01:26 INFO scrapelib: GET - http://www.senadopr.us/Pages/Senadores%20Distrito%20II.aspx
04:01:26 INFO scrapelib: GET - http://www.senadopr.us/Pages/Senadores%20Distrito%20VIII.aspx
04:01:27 INFO scrapelib: GET - http://www.senadopr.us/Pages/SenadoresporAcumulacion.aspx
04:01:28 INFO scrapelib: GET - http://www.senadopr.us/Pages/Senadores%20Distrito%20V.aspx
04:01:29 INFO scrapelib: GET - http://www.senadopr.us/Pages/Senadores%20Distrito%20I.aspx
04:01:30 INFO scrapelib: GET - http://www.senadopr.us/Pages/Senadores%20Distrito%20VI.aspx
04:01:31 INFO scrapelib: GET - http://www.senadopr.us/Pages/Senadores%20Distrito%20VII.aspx
04:01:32 INFO scrapelib: GET - http://www.senadopr.us/Pages/Senadores%20Distrito%20III.aspx
04:01:33 INFO scrapelib: GET - http://www.senadopr.us/Pages/Senadores%20Distrito%20IV.aspx
04:01:34 INFO billy: Scraping PR lower chamber.
04:01:34 INFO scrapelib: GET - http://www.tucamarapr.org/dnncamara/ComposiciondelaCamara.aspx
04:01:36 CRITICAL billy: Error: legislators scraper didn't save any objects

Visit http://bobsled.openstates.org/ for more info.

Nosferican commented 7 years ago

What's the policy on hard coding the legislators data? For a project I had to do so. Wikipedia + newspapers give a pretty solid record on any changes during an administration. Usually the Senate's page will just update the information with the latest composition and break any scrape.

jamesturk commented 7 years ago

If it is necessary, I'd be OK w/ it in this case. We haven't had to do it elsewhere, but I think the VI scraper will need something similar.

The scraper should probably have a check for the current date and raise an exception if an election would have taken place.

if datetime.datetime.utcnow() > NEXT_ELECTION_DATE:
    raise Exception("an election was scheduled for, please update PR data")
jamesturk commented 7 years ago

turned off PR legislator scraping for now so bill scraping can resume

jamesturk commented 7 years ago

@Nosferican do you have any idea if there is a source for this?

jamesturk commented 6 years ago

found http://senado.pr.gov/Pages/Senadores.aspx and http://www.tucamarapr.org/dnncamara/ComposiciondelaCamara/Biografia.aspx

Nosferican commented 6 years ago

I would say those two would be the go to source for the lower and upper houses.

csnardi commented 6 years ago

I think now we need to re-enable people scraping for PR, is that done by changing https://github.com/openstates/task-definitions/blob/master/tasks/pr.yml#L2?

showerst commented 6 years ago

Looks like first we'll need to disable the committee scraper by commenting out https://github.com/openstates/openstates/blob/master/openstates/pr/__init__.py#L3 -- then we can remove the 'bills' from pr.yml

If you want to pull request that i'll merge, otherwise i'll probably get to it this weekend.

csnardi commented 6 years ago

I updated the committee scraper instead in #2459, and created the corresponding change in task-definitions (https://github.com/openstates/task-definitions/pull/2).

showerst commented 6 years ago

I merged the committee patch but don't have admin rights on the tasks repo. @jamesturk can you merge https://github.com/openstates/task-definitions/pull/2/files please?

csnardi commented 6 years ago

@jamesturk I think task-definitions might need to be updated after that change, like what was needed in in https://github.com/openstates/openstates/issues/2270.

csnardi commented 6 years ago

@jamesturk Any update?

jamesturk commented 6 years ago

sorry missed this. I'll take a look this evening

On Thu, Sep 6, 2018, 03:09 Chris Nardi notifications@github.com wrote:

@jamesturk https://github.com/jamesturk Any update?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openstates/openstates/issues/1611#issuecomment-418988945, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAfYkO2a462HnOLiTtWPPfoWt9qcLLMks5uYMpEgaJpZM4Mzuvk .