misja / python-boilerpipe

Python interface to Boilerpipe, Boilerplate Removal and Fulltext Extraction from HTML pages
Other
539 stars 143 forks source link

Empty html causes exception from Extractor #50

Closed arisudesu closed 7 years ago

arisudesu commented 7 years ago

I've noticed inconsistency in argument checking in Extractor class, which causes an exception, if an empty string is passed to html argument. Erroneous code is: kwargs.get('html') (https://github.com/misja/python-boilerpipe/blob/master/src/boilerpipe/extract/__init__.py#L48). It does not only check for the argument presence, but evaluates input line to bool (in case of string, does the string have zero-length or no). Therefore, it raises an exception even if the argument is supplied. As I think, the correct way to test for kwarg should be 'html' in kwargs, and an empty html should be valid.

tuxdna commented 7 years ago

@arisudesu Thanks for reporting the issue. Would you like to open a PR for this fix?

arisudesu commented 7 years ago

51

tuxdna commented 7 years ago

Closed via https://github.com/misja/python-boilerpipe/pull/51