vgalin / html2image

A package acting as a wrapper around the headless mode of existing web browsers to generate images from URLs and from HTML+CSS strings or files.
MIT License
344 stars 44 forks source link

There is no obvious way to use a local image / how to do so is not documented. #47

Open RobertSchueler opened 2 years ago

RobertSchueler commented 2 years ago

I have a very small project on a Windows 10 machine with the following image in the file example.png:

image

I try to make a screenshot of a simple html project using the following code:

from html2image import Html2Image

hti = Html2Image()

html_str ="""
<!DOCTYPE html>
<html>
    <body>
        <img src="example.jpg">
    </body>
</html>
"""

css_str = """
body {
  margin: 0;
  background: black;
}
"""

hti.screenshot(html_str=html_str, css_str=css_str, save_as="test.png")

I would expect to see the image in the result. But it only results in the following:

test

If I copy the html and css strings to a file (and link them) I can see the expected result with my browser. So I guess it has nothing to do with reading access or so.

image

vgalin commented 2 years ago

Hello, this issue is due to how html2image handles files.

Context

When you pass the html_str and css_str parameters to the screenshot method, their content is respectively written to an html and a css file inside a directory located in your temp folder (%temp% on Windows). The html file is then opened with Chrome/Chromium in headless mode to take a screenshot. Because the css file is placed in the same directory as the html one, the browser is able to properly find the stylesheet.

In our case, the browser can't find the image as its path leads to nowhere (there is no example.jpg file in the temp directory used by html2image).

Solution

There are four ways to solve this issue.

1 & 2. Use absolute paths or use an URL

The first one would be to use absolute paths when you refer to resouces that are located on your machine, so that they can always be found.

So instead of example.jpg you could use C:\the\current\path\to\example.jpg

There is also the possibility to host the image somewhere and use its URL directly.

3. Load the images using the load_file method

The second one is to manually "load" the image(s) you are using in the directory used by html2image, behind the scene this method will create a copy of the file and place it in the temp directory used by html2image.

from html2image import Html2Image

hti = Html2Image()

hti.load_file('example.jpg')
...

4. Change the temp directory used by html2image

The third one is more of a 'hack', but can be useful when you don't want to manually load a resource (when you have multiple images or when you don't necessarily know their name), it consists of setting the temporary directory used by html2image to the current directory. The html file(s) will be generated in the current directory, where the image already is.

from html2image import Html2Image

hti = Html2Image(temp_path='./')
...

That should help you to solve this issue, tell me if you still encounter problems. Please do not close the issue as I will keep it as a reminder to document this behaviour (you can remove yourself from the "participants" list if you do not wish to receive updates).

vgalin commented 2 years ago

TODO

RobertSchueler commented 2 years ago

Thanks a lot, worked like a charm.

MartinPicc commented 2 years ago

As an alternative, you can also load png as base64 string into the HTML itself.

Example:

# replace src path with base64 encoded string
png_paths = re.findall('src="(\S+?.png)"', html)
for png_path in png_paths:
    with open(png_path, 'rb') as f:
        base64_png = 'data:image/png;base64, ' + base64.b64encode(f.read()).decode('ascii')
    html = html.replace(png_path, base64_png)

It uses a regex expression to detect src tags ending with .png (and defined with double quotes) in the HTML. Then it opens each image, converts them to base64 string, and injects them back into the HTML