I've noticed inconsistency in argument checking in Extractor class, which causes an exception, if an empty string is passed to html argument. Erroneous code is: kwargs.get('html') (https://github.com/misja/python-boilerpipe/blob/master/src/boilerpipe/extract/__init__.py#L48). It does not only check for the argument presence, but evaluates input line to bool (in case of string, does the string have zero-length or no). Therefore, it raises an exception even if the argument is supplied.
As I think, the correct way to test for kwarg should be 'html' in kwargs, and an empty html should be valid.
I've noticed inconsistency in argument checking in Extractor class, which causes an exception, if an empty string is passed to
html
argument. Erroneous code is:kwargs.get('html')
(https://github.com/misja/python-boilerpipe/blob/master/src/boilerpipe/extract/__init__.py#L48). It does not only check for the argument presence, but evaluates input line to bool (in case of string, does the string have zero-length or no). Therefore, it raises an exception even if the argument is supplied. As I think, the correct way to test for kwarg should be'html' in kwargs
, and an empty html should be valid.