Closed mentalstring closed 2 years ago
I filed an upstream bug report so that they may consider re-adding -utf8
as no-op just for backwards compatibility.
As a workaround for now, urlwatch will now check the html2text -help
output for -utf8
and if found, will add it to the command-line arguments, and if not, will leave it out (this avoids having to parse version numbers and stuff). This should make it work with both old and new versions, albeit with the downside that html2text
is executed twice (once for -help
and then for the real conversion).
Recently our system upgraded html2text from 1.3.2a to 2.1.1 and its use in urlwatch stopped working when using the 'html2text' method.
Previous options:
Newer options:
It seems it now defaults to UTF-8, while previous versions assumed ISO-8859-1 without the
-utf8
.Beyond the version difference, it seems the development of html2text has switched hands in which the 2.0.0+ versions no longer have the
-utf8
switch.Currently this is relevant here:
https://github.com/thp/urlwatch/blob/1836a41c7d93dcc9129826a8f88f321194c4fa67/lib/urlwatch/html2txt.py#L93
I'm not seeing a clean solution. Just wanted to report this for now as it will become a bigger issue as more people update html2text. I guess current alternatives (besides not upgrading html2text) is to use one of the other html2text methods (eg: pyhtml2text).