Closed tribble closed 10 years ago
I think that's on Sunlight's Congress API (https://github.com/sunlightlabs/congress/issues), but the data we're generating here is the cause of the problem.
Sorry, you're right.
However, the issue still stands. I'm happy to try to jump in and contribute, but what would you consider the "right" solution to this problem? Preserving some line breaks?
As it stands now the summary works well for something like search, but it's not very good for human consumption.
I think preserving the line breaks would be the right call. Looks like THOMAS is using a bunch of <p>
tags, so we could just convert those to \n\n
between blocks, to keep the field plain-text. Anyone who doesn't want the line-breaks for some reason can still strip them out easily enough.
I don't normally work with python, but I'll take a stab at this with test coverage.
@tribble : Thanks for giving it a shot! Let us know if you run into any questions.
Fixed by #105. Thanks @tribble!
Bill summary in the bills API strips the formatting from Thomas.
Example response for bill hr1204-113:
Here is the corresponding page on Thomas: http://thomas.loc.gov/cgi-bin/bdquery/z?d113:H.R.1204:@@@D&summ2=m&
Is there a way to keep some of the formatting, such as paragraph breaks? Or perhaps create a new field that contains the summary with all of its markup?