openstates / issues

Having trouble? Looking to contribute? Issues live here!
15 stars 2 forks source link

MA needs a vote scraper #59

Open jamesturk opened 6 years ago

jamesturk commented 6 years ago

State: MA

needs investigation, no MA VoteEvents in DB

In-vincible commented 6 years ago

Continuing with this one @jamesturk

In-vincible commented 6 years ago

Votes information is not available the bills page anymore, nor the actions information. example bill link:https://malegislature.gov/Bills/190/H1485, in case anybody want to double check.

estaub commented 6 years ago

Hmm... some votes are there: https://malegislature.gov/Bills/190/S2371/BillHistory. And the journals seem to still have them.

In-vincible commented 6 years ago

Others bills with roll call, that I found: https://malegislature.gov/Bills/190/H1110 https://malegislature.gov/Bills/190/H1120 https://malegislature.gov/Bills/190/H1697 https://malegislature.gov/Bills/190/H1100

I almost run the bill scraper for 5-6 hours it still didn't complete.

estaub commented 6 years ago

All the roll calls cited thus far are for Senate votes. When I poked around 3 days ago I only found Senate roll calls too; at the time it seemed very possibly coincidence, so I didn't mention it.

estaub commented 6 years ago

Ok, here's an example of a House roll call. On the bill page, https://malegislature.gov/Bills/190/H4479, there's no Roll Call tab. However, there is this Action:

Passed to be engrossed - 147 YEAS to 4 NAYS (See YEA and NAY in Supplement, No. 348)

Note the 348. If you watch the HTTP (using Chrome Dev Tools "Network" tab or whatever) from a "Download Roll Call" for Roll Call 348, from https://malegislature.gov/Journal/House, you can see that it's a straightforward form request. (Though I'm concerned about handling the cookie.)

The multi-column roll call PDF format isn't the friendliest in the world.

In-vincible commented 6 years ago

Did check it @estaub looks difficult since they have

__RequestVerificationToken parameter in cookies. It might be hackable but doesn't seem to be the fair way to do it, apparently they don't want it that's why they have verification systems.

estaub commented 6 years ago

@In-vincible I wouldn't infer that they are trying to deter scraping; it may be generic support in their platform, and/or to control access to certain non-public features that we aren't accessing; note the "My Legislature" sign-in behind the user icon at upper right.

I'm not sure the cookie-handling is difficult, either. In fact, it may be handled for free, or nearly so. I'd check with @jamesturk . There's definitely infrastructure for handling some of this.

jamesturk commented 4 years ago

a renewed investigation showed that there still aren't great sources for votes in MA, this might need to be one we pressure them on

mzagaja commented 3 years ago

@jamesturk Massachusetts now has a beta swagger API https://malegislature.gov/api/swagger/index.html?url=/api/swagger/v1/swagger.json#

jamesturk commented 3 years ago

Thanks for the heads up on this, that’s great news!! On Mar 20, 2021, 2:54 PM -0400, Matthew Zagaja @.***>, wrote:

@jamesturk Massachusetts now has a beta swagger API https://malegislature.gov/api/swagger/index.html?url=/api/swagger/v1/swagger.json# — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

bhrutledge commented 3 years ago

Hi there! I've been having fun exploring this API, and put my work so far up at https://github.com/bhrutledge/ma-legislature-api. That includes some Jupyter notebooks (i.e. Python code) to make API requests and process the responses, and a proof-of-concept for a map of sponsors and committee members related to a particular bill.

In the process, I've been keeping track of issues that I encountered. I've gathered all of those into a discussion thread, and shared that with Paul Pak (CIO for Legislative Information Services for the state house).

Here's what Paul had to say in response to some of my initial questions:

The API is designed to evolve in parallel to the site in terms of content. Currently, ~90% of the site is covered via the swagger API, and we're working on completing what remains this year. By the end of the year, the API will remain in lock-step with whatever content is offered on malegislature.gov. It's technically an open beta, but no one really knows about it unless you google it or know of it. Please feel free to implement it in a project of your own, and we're totally ok if you want to spread the word that it's available for public use.

jamesturk commented 3 years ago

Thanks for the update! This is great news and I'm excited to find the resources to get an updated scraper using it, if you have the interest/capacity I'd be very glad to help however I can.