Open gregoryfoster opened 7 years ago
Hmm, after messing around a bit more I'm starting to feel like I'm missing an important step between ./run fdsys --collections=BILLSTATUS
and ./run bills
. It looks to me like the bills
task is expecting a bootstrapped dataset to already exist in the data
hierarchy, but I don't see any mention of how to achieve that in the README
or the wiki.
Re-opening, didn't mean to close the issue.
@gregoryfoster I filed a quick PR to fix the issue you identified: https://github.com/unitedstates/congress/pull/202
However, the bills
task is defunct and unused. It was designed for thomas.gov, which is now :skull: in favor of congress.gov. The fdsys
task is active, and would be easier for @JoshData to speak to, as he has it up in production.
Thanks, @konklone, for the quick fix. It does take care of creating the data
hierarchy through a specified Congress.
I'm a little puzzled and honestly a little distressed to hear that the bills
task is regarded as defunct, as that shines a different light on GovTrack's announcement that they'll no longer support bulk data access after the 2017 summer recess. Is this project winding down?
No no no, I re-wrote the bills task last year to convert the new official bill XML (from fdsys) into the existing JSON data format. Since GovTrack relies on the JSON format and I don't have the capacity to re-write GovTrack's importer to use the fdsys XML directly, I'm still invested in keeping the bills task running.
The mkdir issue probably stemmed from my rewrite last year, btw. Sorry about breaking it on clean directories (which I never test on).
Whew, glad to hear, @JoshData!
Returning to the original edge case of an absent and now clean data
hierarchy - should I open a separate issue to tackle a clean load scenario? Meaning: while PR #202 avoids the os.listdir
errors, the bills
task as written doesn't take any action on a clean directory as it's compiling the list of bill types and bill IDs from an empty data
hierarchy. That seems like a more substantial chunk of work that would require traversing the fdsys
sitemap metadata files (or is there an easier route?).
Let me know if you want me to open a separate issue. And if you can sketch an outline of what needs to be done, I'd be happy to contribute a PR.
Apologies for confusing the issue! And I can verify what @gregoryfoster says -- #202 fixes the errors, but it still doesn't cause the bills
task to do anything, it just stops with some messages about fetching 0 bills. I couldn't figure out why that was, and mistook the lack of network requests to mean it'd been retired.
Hello, I came to this issue report after attempting to run a clean installation of this scraper and got the error: "No such file or directory: 'data'"
This issue and #202 seems to be related to my error even though it is over 2 years old and still Open. #202 says _"This fixes #201 by using mkdirp as necessary when examining data paths on disk.", but without any specific directions on how or where that fix should be applied.
After reading the last 2 comments here, I have to ask if this scraper is still being maintained? If so, where can I find directions on how to fix this issue? Thanks.
Hi.
At GovTrack we use this project extensively.
Unfortunately we don't have the resources to fix problems that we're not experiencing ourselves, though. This repository was created at a time when multiple well-funded organizations (besides us) we're investing in creating a shared data ecosystem for legislative data, but now some of those organizations effectively don't exist anymore.
Thanks for you quick response.
I started a project several years ago with GovTrack (GT) bulk data. When I came back to it last year the GT data was no longer online. I found parts of it on ProPublica and elsewhere but some parts I can’t find, like the set of Amendments.
I will spend some time over the next few days trying to figure this scraper out. If it can produce what I’m looking for I will post the fix. I might even try to fork it to Python3 since Python2 is due to be obsolete next year.
@jox58 Can I ask specifically which scraper you're running that it doesn't create a data directory? I ask because I cloned the repository into a new directory and ran ./run govinfo --bulkdata=BILLSTATUS
and it created a data directory.
You are right. My mistake for not reading the instructions carefully. I did a ./run bills
without first ./run govinfo --bulkdata=BILLSTATUS
Hello, and thank you for sharing and maintaining such a valuable project. I'm just getting started by way of
legis-graph
and intend to become a frequent user and hopefully a helpful contributor.I've setup a fresh installation and Python 2.7 virtual environment. As a heads up for potential future
congress
users, I ran into an SSL handshake issue sourced toscrapelib
which prevents execution of thefdsys
task (and likely others). That issue and workaround is detailed here.Currently, I'm attempting to
./run bills --congress=115
and the task fails because there is nodata
hierarchy in the filesystem yet.mkdir -p data/115
and a subsequentos.listdir
call will fail because there are no bill types. This is easy enough to workaround with some knowledge of the expected hierarchy, but it seems like something we could also easily fix.I see there's a
mkdir_p
function inutils.py
we could reuse - is there a good central place in the codebase to anticipate this edge case? I'd be happy to put together a pull request with a little guidance.Thanks again for this very useful project!