nickrussler / email-to-pdf-converter

Converts email files (eml, msg) to pdf
https://www.whitebyte.info/publications/eml-to-pdf-converter
Apache License 2.0
277 stars 64 forks source link

weird characters when converted on docker #76

Closed kshitizwagle closed 4 months ago

kshitizwagle commented 4 months ago

Hi, I'm using this service, inside a docker container, am good with the conversion but instead of &nbsp (html space character) (is totally my guess), am getting this weird character  It works fine on my WSL but has issues while in docker container, my assumption is that it is something to do with fonts?

Posting images with difference on WSL and docker

image image

Any help would be appreciated

nickrussler commented 4 months ago

I'd guess it's either fonts or some default encoding is wrong.

Here some chatgpt guidance:

  1. Check Docker Container Locale Settings: Ensure that the locale settings in your Docker container are set correctly. You can check this by running locale inside your Docker container. If the locale is not set correctly, you can set it by adding the following lines to your Dockerfile:

    RUN apt-get update && apt-get install -y locales
    RUN locale-gen en_US.UTF-8
    ENV LANG en_US.UTF-8
    ENV LANGUAGE en_US:en
    ENV LC_ALL en_US.UTF-8
  2. Verify Fonts Installed in Docker Container: Make sure the necessary fonts are installed in your Docker container. You can install common fonts using:

    RUN apt-get update && apt-get install -y fonts-dejavu

    You might need to install additional fonts based on your requirements.

  3. Use Docker's Environment Variable for Java Encoding: Set the environment variable in your Dockerfile to ensure Java uses the correct encoding:

    ENV JAVA_TOOL_OPTIONS "-Dfile.encoding=UTF-8"
kshitizwagle commented 4 months ago

thanks, i tried a bunch of things, ended up replacing NBSP from the code itself, it ain't elegant but it works right now, will do few experiments!