misja / python-boilerpipe

Python interface to Boilerpipe, Boilerplate Removal and Fulltext Extraction from HTML pages
Other
539 stars 143 forks source link

Add a default useragent #7

Closed tlvince closed 11 years ago

tlvince commented 11 years ago

urllib2 by default does not set a useragent string. On some websites, requests without a "User-Agent" headers are assumed to be malicious bots and are therefore blocked.

This adds a "fake" useragent to workaround this.