Separate common utilities from source-specific scripts

unitedstates / congress

Public domain data collectors for the work of Congress, including legislation, amendments, and votes.

https://github.com/unitedstates/congress/wiki

Creative Commons Zero v1.0 Universal

929 stars 202 forks source link

Separate common utilities from source-specific scripts #38

Closed GPHemsley closed 11 years ago

GPHemsley commented 11 years ago

(A spin-off from #34.)

As the project is growing, it is starting to feel growing pains from the utilities that have been added. Common utilities that do not rely on outside sources should be split into their own separate file(s) so that new scripts can import them without importing methods that aren't needed.

For example, bill_info.py contains a lot methods useful for outputting bill data, but also contains a lot of methods for getting bill data from THOMAS. Also, utils.py might be better split off into multiple files grouped by function.

konklone commented 11 years ago

Gonna close this for now. Feel free to re-open with a specific refactoring to tackle.

GPHemsley commented 11 years ago

I apparently don't have the proper permissions to reopen this, but I think now would be a good time to tackle this: Moving the bill processing functions to a separate file (or perhaps to a separate project, combining with similar functions from congress-legislators) so that they can be reused instead of recreated.

With the addition of American Memory, we have at least 3 different places where we get bill information from (THOMAS/Congress.gov, Statutes at Large), and they all effectively output the same format once they're parsed. I think it's time to untie the output functions from a single parser.

konklone commented 11 years ago

Sounds like the perfect time to do it. Thanks for tackling this.

GPHemsley commented 11 years ago

An incomplete list of functions that would be useful for American Memory processing:

congress.utils.format_datetime()
congress.bill_info.latest_status()
congress.utils.write()
congress.utils.data_dir()

GPHemsley commented 11 years ago

I'd be happy to do it, but I think we should do a little planning and coordination first. In particular, where should it go? A new file or a new project?

GPHemsley commented 11 years ago

Oh, silly me: The reason I don't have access to those functions is because I'm working outside of unitedstates/congress. So a new project/repository would help that.

But the real problem is probably with functions like congress.bill_versions.fetch_version(), which I couldn't use for the statutes. That could benefit from just being split up into pieces.

konklone commented 11 years ago

The tiny utils functions, we're duplicating those across a bunch of places - fortunately, they're small enough that it's not a big deal. It'd add complexity to have a generic utils repo that we have to dynamically link into the others.

If the American Memory code is outputting bill information in a standard form, is it appropriate to actually bring into unitedstates/congress...? I know the plan is to bring it into unitedstates in some way, but if the code is really that similar, maybe even putting it in this repo is a good idea?

JoshData commented 11 years ago

There's no point in moving files to a new repository. That doesn't make it any easier to access the functions. Just use PYTHONPATH=path/to/congress or some other method to make the congress project modules available to your American Memory project.

The only refactoring that I think is necessary is to isolate the part that converts the JSON to GovTrack-style XML. Everything else should be fine.

GPHemsley commented 11 years ago

The more I work on the American Memory parser, the more I tend to agree that it seems to belong in unitedstates/congress. (Though perhaps that's because I'm making an effort to make it be similar.) I'm not clear where the parser falls into @tauberer's big plans for American Memory. :)

konklone commented 11 years ago

Yeah, that sounds right to me, Josh - the simpler the integration the better, and that works just fine.

GPHemsley commented 11 years ago

Sorry, I got my terminology wrong: What I meant to suggest was that we create a separate Python package that could be installed/imported on its own to do all the basic work.

GPHemsley commented 11 years ago

This was mooted by #95 and will be further handled by unitedstates/utils.