unitedstates / congress

Public domain data collectors for the work of Congress, including legislation, amendments, and votes.
https://github.com/unitedstates/congress/wiki
Creative Commons Zero v1.0 Universal
912 stars 198 forks source link

Convert congress codebase to Python 3 #265

Closed acxz closed 3 years ago

acxz commented 3 years ago

While it is possible to maintain backward compatibility with Python 2 and Python 3, it proved quite difficult for me to do, and ripping the bandaid off and converted to Python 3 seemed to be the better path.

This PR converts all python files to work with Python 3 as well as changing other metadata information to use Python 3.

Would love for people to test this branch out.

JoshData commented 3 years ago

Great. I'd like to also have some 2to3 changes that are unnecessary reverted. I can try to do it.

acxz commented 3 years ago

Sounds good feel free to commit on top of this fork/branch. Just gave you access to my fork.

acxz commented 3 years ago

@JoshData thx for the commit, what should be the criteria of getting this merged into master?

Right now @stevesdawg and I have tested ./run govinfo --bulkdata=BILLSTATUS, ./run bills, ./run votes --congress=112 --session=2012 and these commands all work.

If you can make a checklist about what commands should be tested/work that would be ideal.

JoshData commented 3 years ago

The only criteria for me --- but I don't know if there is anyone else that is tracking this, so I may be the only one that matters --- is that it doesn't break GovTrack's scraping process. These are the commands GovTrack runs:

./run committee_meetings --docs=False
./run govinfo --collections=BILLS --extract=mods,text,xml,pdf
./run govinfo --collections=CRPT --extract=mods
./run govinfo --bulkdata=BILLSTATUS
./run bills --govtrack --congress=###
./run upcoming_house_floor --download
./run votes --govtrack --force --fast

(There are other scrapers for historical data but I am OK if those break from not being tested.)

Once you're set, I will try the branch locally, and if it runs OK for about a week when Congress is in session (not next week) then we'll be set to merge.

I appreciate the effort. Would also love to know how/why you're using this repo.

acxz commented 3 years ago

I would suggest make a GIthub release/tag on the latest python 2 commit before this branch is merged so that existing users can still refer back to the python 2 version if they would prefer to.

dwillis commented 3 years ago

Agree with @JoshData: I'm in favor of this, but making sure the GovTrack scrapers work is the priority. I'll test out things that we use, too.

acxz commented 3 years ago

@dwillis feel free to post the commands you use here as well so that we can aggregate the features we need to prioritize as well as keep track of them before we merge.

I'll go ahead and add your commands to the checklist I have made.

sseshan7 commented 3 years ago

@JoshData I've been maintaining a simple webserver that is similar to GovTrack, called govstat.us. This area of using technology and open data sources to make government data more accessible sounds very compelling. So I started working on it as a hobby/side project.

That's also why I'm contributing to this extremely useful scraper.

sseshan7 commented 3 years ago

@JoshData, we've got the commands working that you listed above. There may be some code paths that haven't been upgraded (based on different options and whatnot), but the ones up top seem to be working in Python3.6+

acxz commented 3 years ago

@JoshData just want to check up on the status of this PR.

acxz commented 3 years ago

@JoshData its been another week, have you been able to try this branch out?

JoshData commented 3 years ago

Sorry I haven't had time yet.

JoshData commented 3 years ago

Ok I'm running it on GovTrack now. If there aren't any issues over the next week it should be fine to merge. I'm not expecting any issues.

acxz commented 3 years ago

Thanks! Just glad to hear that it works and we are free from python 2 haha