Closed GPHemsley closed 11 years ago
Gonna close this for now. Feel free to re-open with a specific refactoring to tackle.
I apparently don't have the proper permissions to reopen this, but I think now would be a good time to tackle this: Moving the bill processing functions to a separate file (or perhaps to a separate project, combining with similar functions from congress-legislators) so that they can be reused instead of recreated.
With the addition of American Memory, we have at least 3 different places where we get bill information from (THOMAS/Congress.gov, Statutes at Large), and they all effectively output the same format once they're parsed. I think it's time to untie the output functions from a single parser.
Sounds like the perfect time to do it. Thanks for tackling this.
An incomplete list of functions that would be useful for American Memory processing:
I'd be happy to do it, but I think we should do a little planning and coordination first. In particular, where should it go? A new file or a new project?
Oh, silly me: The reason I don't have access to those functions is because I'm working outside of unitedstates/congress. So a new project/repository would help that.
But the real problem is probably with functions like congress.bill_versions.fetch_version(), which I couldn't use for the statutes. That could benefit from just being split up into pieces.
The tiny utils functions, we're duplicating those across a bunch of places - fortunately, they're small enough that it's not a big deal. It'd add complexity to have a generic utils repo that we have to dynamically link into the others.
If the American Memory code is outputting bill information in a standard form, is it appropriate to actually bring into unitedstates/congress...? I know the plan is to bring it into unitedstates in some way, but if the code is really that similar, maybe even putting it in this repo is a good idea?
There's no point in moving files to a new repository. That doesn't make it any easier to access the functions. Just use PYTHONPATH=path/to/congress or some other method to make the congress project modules available to your American Memory project.
The only refactoring that I think is necessary is to isolate the part that converts the JSON to GovTrack-style XML. Everything else should be fine.
The more I work on the American Memory parser, the more I tend to agree that it seems to belong in unitedstates/congress. (Though perhaps that's because I'm making an effort to make it be similar.) I'm not clear where the parser falls into @tauberer's big plans for American Memory. :)
Yeah, that sounds right to me, Josh - the simpler the integration the better, and that works just fine.
Sorry, I got my terminology wrong: What I meant to suggest was that we create a separate Python package that could be installed/imported on its own to do all the basic work.
This was mooted by #95 and will be further handled by unitedstates/utils.
(A spin-off from #34.)
As the project is growing, it is starting to feel growing pains from the utilities that have been added. Common utilities that do not rely on outside sources should be split into their own separate file(s) so that new scripts can import them without importing methods that aren't needed.
For example, bill_info.py contains a lot methods useful for outputting bill data, but also contains a lot of methods for getting bill data from THOMAS. Also, utils.py might be better split off into multiple files grouped by function.