openstates / issues

Having trouble? Looking to contribute? Issues live here!
15 stars 2 forks source link

PR: Voting data now available for the Senate #391

Open Nosferican opened 3 years ago

Nosferican commented 3 years ago

Is your feature request related to a problem? Please describe.

Traditionally the voting data for PR was locked behind no crawling / scraping systems that provided faxed pdf images or an outdated Flash Player application. This month the Senate has transitioned to posting the cleaned records reports in PDF.

Describe the solution you'd like

A simple crawler/parser can now be used to collect the data.

Describe alternatives you've considered

The House will likely follow but hasn't yet. Will open an issue if that happens.

Additional context

Here is the link to the new site: https://www.senado.pr.gov/Pages/VotacionMedidas.aspx

CC: @froi

drvander commented 3 years ago

I can try this out. Not sure if I'll be able to get the full scraper completed, but with the other vote scrapers and PR bill scraper I should be able to get started.

Edit: Looks like I'll be able to submit a PR for a working version. I'm getting all the data from the pdfs -- working on formatting.

Nosferican commented 3 years ago

Is there additional work required to expose the data in the next refresh? I tried

query {
  bill(id: "ocd-bill/99a8e770-3df4-4f22-975c-a602b48aa9b9") {
    identifier
    votes {
      edges {
        node {
          votes {
            voter {
              id
            }
            option
          }
        }
      }
    }
  }
}

giving me

{
  "data": {
    "bill": {
      "identifier": "RS 1",
      "votes": {
        "edges": [
          {
            "node": {
              "votes": []
            }
          }
        ]
      }
    }
  }
}

so it seems the data is not yet "live" for consumption.

Nosferican commented 3 years ago

Bump. I checked today and it seems the issue stills persists.

jamesturk commented 3 years ago

Can you link to the vote that took place on that bill on the official site?  We can take a closer look at what is going on here soon On May 4, 2021, 1:20 PM -0400, José Bayoán Santiago Calderón @.***>, wrote:

Bump. I checked today and it seems the issue stills persists. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

Nosferican commented 3 years ago

I was just looking at a few from the first file but it seem none are currently exposed. Here is the example for the second vote record which also is not available.

{
  bill(jurisdiction: "Puerto Rico", session: "2021-2024", identifier: "RS 36") {
    identifier
    votes {
      edges {
        node {
          votes {
            voter {
              id
            }
            option
          }
        }
      }
    }
  }
}
{
  "data": {
    "bill": {
      "identifier": "RS 36",
      "votes": {
        "edges": []
      }
    }
  }
}
jamesturk commented 3 years ago

I will check back tomorrow to see if the latest scrape fixes this, in the future though, please do not post comments like "Bump" as they do not add to the conversation (nor bump things in GitHub's UI)

jamesturk commented 3 years ago

openstates.exceptions.UnresolvedIdError: cannot resolve pseudo id to Bill: ~{"from_organization__classification": "upper", "identifier": "PC 21", "legislative_session__identifier": "2021-2024"}

Getting the above error... this suggests that the identifiers used for the bills do not match those used for the votes, so they cannot be reconciled, it might be the leading zeroes. I'll need to come back to this and take a look, but if you can run locally and not get that error let me know

Nosferican commented 3 years ago

Hi James, that makes sense. The bill identifiers does seem to be the most likely cause of the issue and be an issue of the leading zeroes. Thanks!