unitedstates / inspectors-general

Collecting reports from Inspectors General across the US federal government.
https://sunlightfoundation.com/blog/2014/11/07/opengov-voices-opening-up-government-reports-through-teamwork-and-open-data/
Creative Commons Zero v1.0 Universal
107 stars 21 forks source link

BeautifulSoup is throwing usage warnings #251

Closed harrisj closed 9 years ago

harrisj commented 9 years ago

Should I go ahead and do a pull request to fix this. It's not breaking things, it's just mildly annoying.

/usr/local/lib/python3.5/site-packages/bs4/__init__.py:166: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html.parser"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

To get rid of this warning, change this:

 BeautifulSoup([your markup])

to this:

 BeautifulSoup([your markup], "html.parser")

  markup_type=markup_type))
divergentdave commented 9 years ago

Good tip, thanks. We have previously fought with BeautifulSoup's parser autodetection. (see 6e34957c9e77ccea7f2ee742817bcbbc8fffdab8) The upshot is that we should be using "lxml", which will get installed if you run pip install -r requirements.txt. We should explicitly pass "lxml" to the BeautifulSoup constructor wherever we use it; if you could whip up a PR for that, it would be much appreciated!

harrisj commented 9 years ago

And I was just about to submit my own pull request where I pulled the beautifulsoup_from_url into utils and replaced a bunch of calls in the scrapers with a reference to that... Do you still want it?

Sent from my iPhone

On Oct 12, 2015, at 6:55 AM, David Cook notifications@github.com wrote:

Closed #251 https://github.com/unitedstates/inspectors-general/issues/251 via 5a102c0 https://github.com/unitedstates/inspectors-general/commit/5a102c04b8916ed9e83eeb4ea0147f6154b6c556 .

— Reply to this email directly or view it on GitHub https://github.com/unitedstates/inspectors-general/issues/251#event-432818695 .

divergentdave commented 9 years ago

That sounds better, I'll revert my commit.

divergentdave commented 9 years ago

Fixed by #252.