jarrekk / imgkit

🌁 Wkhtmltoimage python wrapper to convert HTML to image
MIT License
807 stars 73 forks source link

utf-8' codec can't decode byte 0xff in position 0: invalid start byte #82

Open adiptamartulandi opened 2 years ago

adiptamartulandi commented 2 years ago

iam using macbook air m1 python 3.7 imgkit==1.2.2 wkhtmltopdf==0.2 wkhtmltoimage 0.12.6

hello i want to read html code but getting error utf-8' codec can't decode byte 0xff in position 0: invalid start byte

here is my code

import imgkit
import base64
from IPython.display import display, HTML

body = '''
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang="">
<head>
  <meta charset="utf-8" />
  <meta name="generator" content="pandoc" />
  <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
  <title>equations</title>
  <style type="text/css">
      code{white-space: pre-wrap;}
      span.smallcaps{font-variant: small-caps;}
      span.underline{text-decoration: underline;}
      div.column{display: inline-block; vertical-align: top; width: 50%;}
  </style>
</head>
<body>
<p>Professional Format</p>
<meta charset="utf-8" />
<p><math display="block" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msubsup><mo>∫</mo><mn>0</mn><mn>1</mn></msubsup><mi>x</mi></mrow><annotation encoding="application/x-tex">\int_{0}^{1}x</annotation></semantics></math></p>
<p>Linear Format</p>
<p><math display="block" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo>∖</mo><mi>i</mi><mi>n</mi><mi>t</mi><mi>_</mi><mo stretchy="false" form="prefix">{</mo><mn>0</mn><mo stretchy="false" form="postfix">}</mo><mover><mrow></mrow><mo accent="true">̂</mo></mover><mo stretchy="false" form="prefix">{</mo><mn>1</mn><mo stretchy="false" form="postfix">}</mo><mi>x</mi></mrow><annotation encoding="application/x-tex">\backslash int\_\{ 0\}\hat{}\{ 1\} x</annotation></semantics></math></p>
<p>Linear Format with lt</p>
<p><math display="block" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo>∖</mo><mi>i</mi><mi>n</mi><msub><mi>t</mi><mrow><mo stretchy="true" form="prefix">{</mo><mn>0</mn><mo stretchy="true" form="postfix">}</mo></mrow></msub><mo>&lt;</mo><mrow><mo stretchy="true" form="prefix">{</mo><mn>1</mn><mo stretchy="true" form="postfix">}</mo></mrow><mi>x</mi><mo>&lt;</mo><mn>5</mn></mrow><annotation encoding="application/x-tex">\backslash int_{\left\{ 0 \right\}} &lt; \left\{ 1 \right\} x &lt; 5</annotation></semantics></math></p>
</body>
</html>
'''

options = {
    "quiet": ""
}

img = imgkit.from_string(body, False, options=options)
UrbanKeith commented 2 years ago

I had the same issue. Having rummaged in the library, I found that, when using the --quiet flag, apparently, wkhtmltopdf passes the same thing to the stder as to the stdout, that is, a byte string with a picture. However, in the library, the stder stream is decoded to utf-8, which causes the error. Until the bug is fixed, it can be bypassed as follows:

try:
  jpeg = imgkit.from_string(html_string, False, options={'quiet': ''})
except UnicodeDecodeError as err:
  jpeg = err.args[1]