openstates / issues

Having trouble? Looking to contribute? Issues live here!
15 stars 2 forks source link

NJ scraper: property "pupa_id" not in vote_event schema #470

Closed SteadyGiant closed 3 years ago

SteadyGiant commented 3 years ago

This regards the NJ scraper. I run docker-compose run --rm scrape nj bills --fastmode --scrape and get:

nj (scrape)
  bills: {}
12:39:13 INFO openstates: save jurisdiction New Jersey as jurisdiction_ocd-jurisdiction-country:us-state:nj-state.json
12:39:13 INFO openstates: save organization New jersey Legislature as organization_4c615a40-fb6a-11eb-bc01-0242ac120002.json
12:39:13 INFO openstates: save organization Senate as organization_4c6170b6-fb6a-11eb-bc01-0242ac120002.json
12:39:13 INFO openstates: save organization Assembly as organization_4c618524-fb6a-11eb-bc01-0242ac120002.json
12:39:13 INFO openstates: no session specified, using 219
12:39:13 INFO scrapelib: GET - 'ftp://www.njleg.state.nj.us/ag/2020data/DB2020.mdb'
12:39:36 INFO openstates: mdb filename = /tmp/tmp6_ru83ox
/root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/bill.py:112: RuntimeWarning: duplicate sponsor {'name': 'Wirths, Harold J.', 'classification': 'primary', 'entity_type': 'person', 'primary': True, 'person_id': '~{"name": "Wirths, Harold J."}', 'organization_id': None}
  warnings.warn(f"duplicate sponsor {sp}", RuntimeWarning)
/root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/bill.py:112: RuntimeWarning: duplicate sponsor {'name': 'Murphy, Carol A.', 'classification': 'primary', 'entity_type': 'person', 'primary': True, 'person_id': '~{"name": "Murphy, Carol A."}', 'organization_id': None}
  warnings.warn(f"duplicate sponsor {sp}", RuntimeWarning)
/root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/bill.py:112: RuntimeWarning: duplicate sponsor {'name': 'Verrelli, Anthony S.', 'classification': 'primary', 'entity_type': 'person', 'primary': True, 'person_id': '~{"name": "Verrelli, Anthony S."}', 'organization_id': None}
  warnings.warn(f"duplicate sponsor {sp}", RuntimeWarning)
/root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/bill.py:112: RuntimeWarning: duplicate sponsor {'name': 'Vainieri Huttle, Valerie', 'classification': 'primary', 'entity_type': 'person', 'primary': True, 'person_id': '~{"name": "Vainieri Huttle, Valerie"}', 'organization_id': None}
  warnings.warn(f"duplicate sponsor {sp}", RuntimeWarning)
/root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/bill.py:112: RuntimeWarning: duplicate sponsor {'name': 'Holley, Jamel C.', 'classification': 'primary', 'entity_type': 'person', 'primary': True, 'person_id': '~{"name": "Holley, Jamel C."}', 'organization_id': None}
  warnings.warn(f"duplicate sponsor {sp}", RuntimeWarning)
/root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/bill.py:112: RuntimeWarning: duplicate sponsor {'name': 'Caputo, Ralph R.', 'classification': 'primary', 'entity_type': 'person', 'primary': True, 'person_id': '~{"name": "Caputo, Ralph R."}', 'organization_id': None}
  warnings.warn(f"duplicate sponsor {sp}", RuntimeWarning)
/root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/bill.py:112: RuntimeWarning: duplicate sponsor {'name': 'Speight, Shanique', 'classification': 'primary', 'entity_type': 'person', 'primary': True, 'person_id': '~{"name": "Speight, Shanique"}', 'organization_id': None}
  warnings.warn(f"duplicate sponsor {sp}", RuntimeWarning)
/root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/bill.py:112: RuntimeWarning: duplicate sponsor {'name': 'Benson, Daniel R.', 'classification': 'primary', 'entity_type': 'person', 'primary': True, 'person_id': '~{"name": "Benson, Daniel R."}', 'organization_id': None}
  warnings.warn(f"duplicate sponsor {sp}", RuntimeWarning)
/root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/bill.py:112: RuntimeWarning: duplicate sponsor {'name': 'Dunn, Aura K.', 'classification': 'primary', 'entity_type': 'person', 'primary': True, 'person_id': '~{"name": "Dunn, Aura K."}', 'organization_id': None}
  warnings.warn(f"duplicate sponsor {sp}", RuntimeWarning)
/root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/bill.py:112: RuntimeWarning: duplicate sponsor {'name': 'Codey, Richard J.', 'classification': 'primary', 'entity_type': 'person', 'primary': True, 'person_id': '~{"name": "Codey, Richard J."}', 'organization_id': None}
  warnings.warn(f"duplicate sponsor {sp}", RuntimeWarning)
/root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/bill.py:112: RuntimeWarning: duplicate sponsor {'name': 'Testa, Michael L.', 'classification': 'primary', 'entity_type': 'person', 'primary': True, 'person_id': '~{"name": "Testa, Michael L."}', 'organization_id': None}
  warnings.warn(f"duplicate sponsor {sp}", RuntimeWarning)
/root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/bill.py:112: RuntimeWarning: duplicate sponsor {'name': 'Sweeney, Stephen M.', 'classification': 'primary', 'entity_type': 'person', 'primary': True, 'person_id': '~{"name": "Sweeney, Stephen M."}', 'organization_id': None}
  warnings.warn(f"duplicate sponsor {sp}", RuntimeWarning)
/root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/bill.py:112: RuntimeWarning: duplicate sponsor {'name': 'Pou, Nellie', 'classification': 'primary', 'entity_type': 'person', 'primary': True, 'person_id': '~{"name": "Pou, Nellie"}', 'organization_id': None}
  warnings.warn(f"duplicate sponsor {sp}", RuntimeWarning)
/root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/bill.py:112: RuntimeWarning: duplicate sponsor {'name': 'Sarlo, Paul A.', 'classification': 'primary', 'entity_type': 'person', 'primary': True, 'person_id': '~{"name": "Sarlo, Paul A."}', 'organization_id': None}
  warnings.warn(f"duplicate sponsor {sp}", RuntimeWarning)
/root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/bill.py:112: RuntimeWarning: duplicate sponsor {'name': 'Kean, Thomas H.', 'classification': 'cosponsor', 'entity_type': 'person', 'primary': False, 'person_id': '~{"name": "Kean, Thomas H."}', 'organization_id': None}
  warnings.warn(f"duplicate sponsor {sp}", RuntimeWarning)
/root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/bill.py:112: RuntimeWarning: duplicate sponsor {'name': 'Oroho, Steven V.', 'classification': 'cosponsor', 'entity_type': 'person', 'primary': False, 'person_id': '~{"name": "Oroho, Steven V."}', 'organization_id': None}
  warnings.warn(f"duplicate sponsor {sp}", RuntimeWarning)
/root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/bill.py:112: RuntimeWarning: duplicate sponsor {'name': 'Schepisi, Holly T.', 'classification': 'cosponsor', 'entity_type': 'person', 'primary': False, 'person_id': '~{"name": "Schepisi, Holly T."}', 'organization_id': None}
  warnings.warn(f"duplicate sponsor {sp}", RuntimeWarning)
12:39:37 INFO scrapelib: GET - 'ftp://www.njleg.state.nj.us/votes/A2020.zip'
Traceback (most recent call last):
  File "/root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/bin/os-update", line 8, in <module>
    sys.exit(main())
  File "/root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/cli/update.py", line 318, in main
    report = do_update(args, other, juris)
  File "/root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/cli/update.py", line 205, in do_update
    report["scrape"] = do_scrape(juris, args, scrapers)
  File "/root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/cli/update.py", line 89, in do_scrape
    report[scraper_name] = scraper.do_scrape(**scrape_args)
  File "/root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/base.py", line 163, in do_scrape
    for obj in self.scrape(**kwargs) or []:
  File "/opt/openstates/openstates/scrapers/nj/bills.py", line 254, in scrape
    yield from self.scrape_bills(session, year_abr)
  File "/opt/openstates/openstates/scrapers/nj/bills.py", line 441, in scrape_bills
    votes[vote_id].pupa_id = vote_id
  File "/root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/base.py", line 281, in __setattr__
    'property "{}" not in {} schema'.format(key, self._type)
openstates.exceptions.ScrapeValueError: property "pupa_id" not in vote_event schema
ERROR: 1

It seems, on the backend, pupa_id was replaced by dedupe_key and backward compatibility was removed, but the NJ scraper wasn't updated accordingly. I'll try doing so.

Let me know if there is anything I'm missing.

The only other supported NJ scraper atm is events. I run docker-compose run --rm scrape nj events --fastmode --scrape and it seems to succeed. I get tons of data and just a few warnings about unknown committee codes. I may investigate these warnings and file a separate issue/PR later.

jamesturk commented 3 years ago

I think you have an old version locally, line 441 of nj/bills.py uses dedupe_key in the latest source.

Can you verify you’ve pulled from main?

James On Aug 12, 2021, 9:04 AM -0400, everettt @.***>, wrote:

This regards the NJ scraper. I run docker-compose run --rm scrape nj bills --fastmode --scrape and get: nj (scrape) bills: {} 12:39:13 INFO openstates: save jurisdiction New Jersey as jurisdiction_ocd-jurisdiction-country:us-state:nj-state.json 12:39:13 INFO openstates: save organization New jersey Legislature as organization_4c615a40-fb6a-11eb-bc01-0242ac120002.json 12:39:13 INFO openstates: save organization Senate as organization_4c6170b6-fb6a-11eb-bc01-0242ac120002.json 12:39:13 INFO openstates: save organization Assembly as organization_4c618524-fb6a-11eb-bc01-0242ac120002.json 12:39:13 INFO openstates: no session specified, using 219 12:39:13 INFO scrapelib: GET - 'ftp://www.njleg.state.nj.us/ag/2020data/DB2020.mdb' 12:39:36 INFO openstates: mdb filename = /tmp/tmp6_ru83ox /root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/bill.py:112: RuntimeWarning: duplicate sponsor {'name': 'Wirths, Harold J.', 'classification': 'primary', 'entity_type': 'person', 'primary': True, 'person_id': '~{"name": "Wirths, Harold J."}', 'organization_id': None} warnings.warn(f"duplicate sponsor {sp}", RuntimeWarning) /root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/bill.py:112: RuntimeWarning: duplicate sponsor {'name': 'Murphy, Carol A.', 'classification': 'primary', 'entity_type': 'person', 'primary': True, 'person_id': '~{"name": "Murphy, Carol A."}', 'organization_id': None} warnings.warn(f"duplicate sponsor {sp}", RuntimeWarning) /root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/bill.py:112: RuntimeWarning: duplicate sponsor {'name': 'Verrelli, Anthony S.', 'classification': 'primary', 'entity_type': 'person', 'primary': True, 'person_id': '~{"name": "Verrelli, Anthony S."}', 'organization_id': None} warnings.warn(f"duplicate sponsor {sp}", RuntimeWarning) /root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/bill.py:112: RuntimeWarning: duplicate sponsor {'name': 'Vainieri Huttle, Valerie', 'classification': 'primary', 'entity_type': 'person', 'primary': True, 'person_id': '~{"name": "Vainieri Huttle, Valerie"}', 'organization_id': None} warnings.warn(f"duplicate sponsor {sp}", RuntimeWarning) /root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/bill.py:112: RuntimeWarning: duplicate sponsor {'name': 'Holley, Jamel C.', 'classification': 'primary', 'entity_type': 'person', 'primary': True, 'person_id': '~{"name": "Holley, Jamel C."}', 'organization_id': None} warnings.warn(f"duplicate sponsor {sp}", RuntimeWarning) /root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/bill.py:112: RuntimeWarning: duplicate sponsor {'name': 'Caputo, Ralph R.', 'classification': 'primary', 'entity_type': 'person', 'primary': True, 'person_id': '~{"name": "Caputo, Ralph R."}', 'organization_id': None} warnings.warn(f"duplicate sponsor {sp}", RuntimeWarning) /root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/bill.py:112: RuntimeWarning: duplicate sponsor {'name': 'Speight, Shanique', 'classification': 'primary', 'entity_type': 'person', 'primary': True, 'person_id': '~{"name": "Speight, Shanique"}', 'organization_id': None} warnings.warn(f"duplicate sponsor {sp}", RuntimeWarning) /root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/bill.py:112: RuntimeWarning: duplicate sponsor {'name': 'Benson, Daniel R.', 'classification': 'primary', 'entity_type': 'person', 'primary': True, 'person_id': '~{"name": "Benson, Daniel R."}', 'organization_id': None} warnings.warn(f"duplicate sponsor {sp}", RuntimeWarning) /root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/bill.py:112: RuntimeWarning: duplicate sponsor {'name': 'Dunn, Aura K.', 'classification': 'primary', 'entity_type': 'person', 'primary': True, 'person_id': '~{"name": "Dunn, Aura K."}', 'organization_id': None} warnings.warn(f"duplicate sponsor {sp}", RuntimeWarning) /root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/bill.py:112: RuntimeWarning: duplicate sponsor {'name': 'Codey, Richard J.', 'classification': 'primary', 'entity_type': 'person', 'primary': True, 'person_id': '~{"name": "Codey, Richard J."}', 'organization_id': None} warnings.warn(f"duplicate sponsor {sp}", RuntimeWarning) /root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/bill.py:112: RuntimeWarning: duplicate sponsor {'name': 'Testa, Michael L.', 'classification': 'primary', 'entity_type': 'person', 'primary': True, 'person_id': '~{"name": "Testa, Michael L."}', 'organization_id': None} warnings.warn(f"duplicate sponsor {sp}", RuntimeWarning) /root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/bill.py:112: RuntimeWarning: duplicate sponsor {'name': 'Sweeney, Stephen M.', 'classification': 'primary', 'entity_type': 'person', 'primary': True, 'person_id': '~{"name": "Sweeney, Stephen M."}', 'organization_id': None} warnings.warn(f"duplicate sponsor {sp}", RuntimeWarning) /root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/bill.py:112: RuntimeWarning: duplicate sponsor {'name': 'Pou, Nellie', 'classification': 'primary', 'entity_type': 'person', 'primary': True, 'person_id': '~{"name": "Pou, Nellie"}', 'organization_id': None} warnings.warn(f"duplicate sponsor {sp}", RuntimeWarning) /root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/bill.py:112: RuntimeWarning: duplicate sponsor {'name': 'Sarlo, Paul A.', 'classification': 'primary', 'entity_type': 'person', 'primary': True, 'person_id': '~{"name": "Sarlo, Paul A."}', 'organization_id': None} warnings.warn(f"duplicate sponsor {sp}", RuntimeWarning) /root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/bill.py:112: RuntimeWarning: duplicate sponsor {'name': 'Kean, Thomas H.', 'classification': 'cosponsor', 'entity_type': 'person', 'primary': False, 'person_id': '~{"name": "Kean, Thomas H."}', 'organization_id': None} warnings.warn(f"duplicate sponsor {sp}", RuntimeWarning) /root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/bill.py:112: RuntimeWarning: duplicate sponsor {'name': 'Oroho, Steven V.', 'classification': 'cosponsor', 'entity_type': 'person', 'primary': False, 'person_id': '~{"name": "Oroho, Steven V."}', 'organization_id': None} warnings.warn(f"duplicate sponsor {sp}", RuntimeWarning) /root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/bill.py:112: RuntimeWarning: duplicate sponsor {'name': 'Schepisi, Holly T.', 'classification': 'cosponsor', 'entity_type': 'person', 'primary': False, 'person_id': '~{"name": "Schepisi, Holly T."}', 'organization_id': None} warnings.warn(f"duplicate sponsor {sp}", RuntimeWarning) 12:39:37 INFO scrapelib: GET - 'ftp://www.njleg.state.nj.us/votes/A2020.zip' Traceback (most recent call last): File "/root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/bin/os-update", line 8, in sys.exit(main()) File "/root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/cli/update.py", line 318, in main report = do_update(args, other, juris) File "/root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/cli/update.py", line 205, in do_update report["scrape"] = do_scrape(juris, args, scrapers) File "/root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/cli/update.py", line 89, in do_scrape report[scraper_name] = scraper.do_scrape(scrape_args) File "/root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/base.py", line 163, in do_scrape for obj in self.scrape(kwargs) or []: File "/opt/openstates/openstates/scrapers/nj/bills.py", line 254, in scrape yield from self.scrape_bills(session, year_abr) File "/opt/openstates/openstates/scrapers/nj/bills.py", line 441, in scrape_bills votes[vote_id].pupa_id = vote_id File "/root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/base.py", line 281, in setattr 'property "{}" not in {} schema'.format(key, self._type) openstates.exceptions.ScrapeValueError: property "pupa_id" not in vote_event schema ERROR: 1 It seems, on the backend, pupa_id was replaced by dedupe_key and backward compatibility was removed, but the NJ scraper wasn't updated accordingly. I'll try doing so. Let me know if there is anything I'm missing. The only other supported NJ scraper atm is events. I run docker-compose run --rm scrape nj events --fastmode --scrape and it seems to succeed. I get tons of data and just a few warnings about unknown committee codes. I may investigate these warnings and file a separate issue/PR later. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

SteadyGiant commented 3 years ago

Hmm idk how I forked master instead of main but that seems to be my issue.