michaelrsweet / htmldoc

HTML Conversion Software
https://www.msweet.org/htmldoc
GNU General Public License v2.0
206 stars 46 forks source link

Encoding breaks for special characters #501

Closed petrfalkov closed 1 year ago

petrfalkov commented 1 year ago

Good day,

When converting html to pdf some special characters are displaying incorrectly:

Version: 1.9.16 Previous version: 1.9.11

Command arguments that are being used: --charset iso-8859-1 --format pdf14 --firstpage c1 --size A4 --bodyfont sans --textfont sans --headingfont sans --no-title --headfootfont serif --headfootsize 6 --linkcolor blue --linkstyle plain --header ... --footer ... --no-toc --toclevels 3 --toctitle Inhoudsopgave

When using utf-8 as a charset, problem still persists and many other characters are not being displayed as well (Example: ï, ë, ē, ę, etc.).

Is there any way of making it work?

michaelrsweet commented 1 year ago

Can you attach a sample HTML file that demonstrates the problem?

petrfalkov commented 1 year ago

Hi Michael, thank you for quick reaction. Here is the arrow sample.

<!-- NEW PAGE -->
<h2> <a name="label-BRONNEN"> </a>BRONNEN</h2>
<p>&nbsp;</p>
<p>In dit e-book zijn de onderstaande bronnen gebruikt.&lt; &lt;&nbsp; &#8595;</p>
petrfalkov commented 1 year ago

Update: é displayed normaly, but in combination with other characters breaks (Example: Géjanne) Above issue is not related to arrow issue. Sorry for misleading.

Arrow issue is still relevant.

michaelrsweet commented 1 year ago

The Unicode arrow character (↓) isn't available in most fonts (thus the box), and HTMLDOC doesn't do fallback/multi-master fonts.

Still need the HTML for the other character breaking (just rename to .txt to attach here).

petrfalkov commented 1 year ago

The Unicode arrow character (↓) isn't available in most fonts (thus the box), and HTMLDOC doesn't do fallback/multi-master fonts.

Still need the HTML for the other character breaking (just rename to .txt to attach here).

Issue with the other character was not htmldoc problem. Thank you for your help.