usgpo / bill-status

Information about Bill Status XML Bulk Data including the XML User Guide.
https://www.govinfo.gov/bulkdata/BILLSTATUS
158 stars 47 forks source link

A couple of minor issues #59

Closed amarnathm closed 7 years ago

amarnathm commented 7 years ago

A couple of minor issues.

In BILLSTATUS-114hr240.xml, Senate Recorded Vote 57 appears twice with slightly different text.

The corresponding bill on congress.gov shows the same for Recorded Vote 57 -- it appears twice. https://www.congress.gov/bill/114th-congress/house-bill/240/all-actions?overview=closed&q=%7B%22roll-call-vote%22%3A%22all%22%7D

Are these the same vote inadvertently recorded twice?

There are a few other bills that have this (minor) issue.


In BILLSTATUS-114hres144.xml, there is an invisible character before the xml tag. It's not visible in a number of text editors, but "less filename" on a linux terminal will show it. It's similar to:

foo<?xml version="1.0" encoding="utf-8"?>

A workaround for developers is as proposed in http://stackoverflow.com/questions/3030903/content-is-not-allowed-in-prolog-when-parsing-perfectly-valid-xml-on-gae -- however, it does take a bit of time to figure out that this is the issue.

The same issue is there in several house-roll-call xml files, e.g., http://clerk.house.gov/evs/2003/roll002.xml -- but that's probably outside the scope of this repo.

llaplant commented 7 years ago

Thank you for inquiring about this; both actions presented on Congress.gov and the Bulk Data Repository are valid. In addition, thank you for letting us know about the invisible character.