Closed SteadyGiant closed 3 years ago
I think you have an old version locally, line 441 of nj/bills.py uses dedupe_key in the latest source.
Can you verify you’ve pulled from main?
James On Aug 12, 2021, 9:04 AM -0400, everettt @.***>, wrote:
This regards the NJ scraper. I run docker-compose run --rm scrape nj bills --fastmode --scrape and get: nj (scrape) bills: {} 12:39:13 INFO openstates: save jurisdiction New Jersey as jurisdiction_ocd-jurisdiction-country:us-state:nj-state.json 12:39:13 INFO openstates: save organization New jersey Legislature as organization_4c615a40-fb6a-11eb-bc01-0242ac120002.json 12:39:13 INFO openstates: save organization Senate as organization_4c6170b6-fb6a-11eb-bc01-0242ac120002.json 12:39:13 INFO openstates: save organization Assembly as organization_4c618524-fb6a-11eb-bc01-0242ac120002.json 12:39:13 INFO openstates: no session specified, using 219 12:39:13 INFO scrapelib: GET - 'ftp://www.njleg.state.nj.us/ag/2020data/DB2020.mdb' 12:39:36 INFO openstates: mdb filename = /tmp/tmp6_ru83ox /root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/bill.py:112: RuntimeWarning: duplicate sponsor {'name': 'Wirths, Harold J.', 'classification': 'primary', 'entity_type': 'person', 'primary': True, 'person_id': '~{"name": "Wirths, Harold J."}', 'organization_id': None} warnings.warn(f"duplicate sponsor {sp}", RuntimeWarning) /root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/bill.py:112: RuntimeWarning: duplicate sponsor {'name': 'Murphy, Carol A.', 'classification': 'primary', 'entity_type': 'person', 'primary': True, 'person_id': '~{"name": "Murphy, Carol A."}', 'organization_id': None} warnings.warn(f"duplicate sponsor {sp}", RuntimeWarning) /root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/bill.py:112: RuntimeWarning: duplicate sponsor {'name': 'Verrelli, Anthony S.', 'classification': 'primary', 'entity_type': 'person', 'primary': True, 'person_id': '~{"name": "Verrelli, Anthony S."}', 'organization_id': None} warnings.warn(f"duplicate sponsor {sp}", RuntimeWarning) /root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/bill.py:112: RuntimeWarning: duplicate sponsor {'name': 'Vainieri Huttle, Valerie', 'classification': 'primary', 'entity_type': 'person', 'primary': True, 'person_id': '~{"name": "Vainieri Huttle, Valerie"}', 'organization_id': None} warnings.warn(f"duplicate sponsor {sp}", RuntimeWarning) /root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/bill.py:112: RuntimeWarning: duplicate sponsor {'name': 'Holley, Jamel C.', 'classification': 'primary', 'entity_type': 'person', 'primary': True, 'person_id': '~{"name": "Holley, Jamel C."}', 'organization_id': None} warnings.warn(f"duplicate sponsor {sp}", RuntimeWarning) /root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/bill.py:112: RuntimeWarning: duplicate sponsor {'name': 'Caputo, Ralph R.', 'classification': 'primary', 'entity_type': 'person', 'primary': True, 'person_id': '~{"name": "Caputo, Ralph R."}', 'organization_id': None} warnings.warn(f"duplicate sponsor {sp}", RuntimeWarning) /root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/bill.py:112: RuntimeWarning: duplicate sponsor {'name': 'Speight, Shanique', 'classification': 'primary', 'entity_type': 'person', 'primary': True, 'person_id': '~{"name": "Speight, Shanique"}', 'organization_id': None} warnings.warn(f"duplicate sponsor {sp}", RuntimeWarning) /root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/bill.py:112: RuntimeWarning: duplicate sponsor {'name': 'Benson, Daniel R.', 'classification': 'primary', 'entity_type': 'person', 'primary': True, 'person_id': '~{"name": "Benson, Daniel R."}', 'organization_id': None} warnings.warn(f"duplicate sponsor {sp}", RuntimeWarning) /root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/bill.py:112: RuntimeWarning: duplicate sponsor {'name': 'Dunn, Aura K.', 'classification': 'primary', 'entity_type': 'person', 'primary': True, 'person_id': '~{"name": "Dunn, Aura K."}', 'organization_id': None} warnings.warn(f"duplicate sponsor {sp}", RuntimeWarning) /root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/bill.py:112: RuntimeWarning: duplicate sponsor {'name': 'Codey, Richard J.', 'classification': 'primary', 'entity_type': 'person', 'primary': True, 'person_id': '~{"name": "Codey, Richard J."}', 'organization_id': None} warnings.warn(f"duplicate sponsor {sp}", RuntimeWarning) /root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/bill.py:112: RuntimeWarning: duplicate sponsor {'name': 'Testa, Michael L.', 'classification': 'primary', 'entity_type': 'person', 'primary': True, 'person_id': '~{"name": "Testa, Michael L."}', 'organization_id': None} warnings.warn(f"duplicate sponsor {sp}", RuntimeWarning) /root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/bill.py:112: RuntimeWarning: duplicate sponsor {'name': 'Sweeney, Stephen M.', 'classification': 'primary', 'entity_type': 'person', 'primary': True, 'person_id': '~{"name": "Sweeney, Stephen M."}', 'organization_id': None} warnings.warn(f"duplicate sponsor {sp}", RuntimeWarning) /root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/bill.py:112: RuntimeWarning: duplicate sponsor {'name': 'Pou, Nellie', 'classification': 'primary', 'entity_type': 'person', 'primary': True, 'person_id': '~{"name": "Pou, Nellie"}', 'organization_id': None} warnings.warn(f"duplicate sponsor {sp}", RuntimeWarning) /root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/bill.py:112: RuntimeWarning: duplicate sponsor {'name': 'Sarlo, Paul A.', 'classification': 'primary', 'entity_type': 'person', 'primary': True, 'person_id': '~{"name": "Sarlo, Paul A."}', 'organization_id': None} warnings.warn(f"duplicate sponsor {sp}", RuntimeWarning) /root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/bill.py:112: RuntimeWarning: duplicate sponsor {'name': 'Kean, Thomas H.', 'classification': 'cosponsor', 'entity_type': 'person', 'primary': False, 'person_id': '~{"name": "Kean, Thomas H."}', 'organization_id': None} warnings.warn(f"duplicate sponsor {sp}", RuntimeWarning) /root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/bill.py:112: RuntimeWarning: duplicate sponsor {'name': 'Oroho, Steven V.', 'classification': 'cosponsor', 'entity_type': 'person', 'primary': False, 'person_id': '~{"name": "Oroho, Steven V."}', 'organization_id': None} warnings.warn(f"duplicate sponsor {sp}", RuntimeWarning) /root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/bill.py:112: RuntimeWarning: duplicate sponsor {'name': 'Schepisi, Holly T.', 'classification': 'cosponsor', 'entity_type': 'person', 'primary': False, 'person_id': '~{"name": "Schepisi, Holly T."}', 'organization_id': None} warnings.warn(f"duplicate sponsor {sp}", RuntimeWarning) 12:39:37 INFO scrapelib: GET - 'ftp://www.njleg.state.nj.us/votes/A2020.zip' Traceback (most recent call last): File "/root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/bin/os-update", line 8, in
sys.exit(main()) File "/root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/cli/update.py", line 318, in main report = do_update(args, other, juris) File "/root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/cli/update.py", line 205, in do_update report["scrape"] = do_scrape(juris, args, scrapers) File "/root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/cli/update.py", line 89, in do_scrape report[scraper_name] = scraper.do_scrape(scrape_args) File "/root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/base.py", line 163, in do_scrape for obj in self.scrape(kwargs) or []: File "/opt/openstates/openstates/scrapers/nj/bills.py", line 254, in scrape yield from self.scrape_bills(session, year_abr) File "/opt/openstates/openstates/scrapers/nj/bills.py", line 441, in scrape_bills votes[vote_id].pupa_id = vote_id File "/root/.cache/pypoetry/virtualenvs/openstates-scrapers-vRcYrsYN-py3.7/lib/python3.7/site-packages/openstates/scrape/base.py", line 281, in setattr 'property "{}" not in {} schema'.format(key, self._type) openstates.exceptions.ScrapeValueError: property "pupa_id" not in vote_event schema ERROR: 1 It seems, on the backend, pupa_id was replaced by dedupe_key and backward compatibility was removed, but the NJ scraper wasn't updated accordingly. I'll try doing so. Let me know if there is anything I'm missing. The only other supported NJ scraper atm is events. I run docker-compose run --rm scrape nj events --fastmode --scrape and it seems to succeed. I get tons of data and just a few warnings about unknown committee codes. I may investigate these warnings and file a separate issue/PR later. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.
Hmm idk how I forked master
instead of main
but that seems to be my issue.
This regards the NJ scraper. I run
docker-compose run --rm scrape nj bills --fastmode --scrape
and get:It seems, on the backend,
pupa_id
was replaced bydedupe_key
and backward compatibility was removed, but the NJ scraper wasn't updated accordingly. I'll try doing so.Let me know if there is anything I'm missing.
The only other supported NJ scraper atm is
events
. I rundocker-compose run --rm scrape nj events --fastmode --scrape
and it seems to succeed. I get tons of data and just a few warnings about unknown committee codes. I may investigate these warnings and file a separate issue/PR later.