michaelrsweet / htmldoc

HTML Conversion Software
https://www.msweet.org/htmldoc
GNU General Public License v2.0
210 stars 47 forks source link

Accented characters cause display issues in v1.9 this was OK in v1.8 #497

Closed Sparky10 closed 1 year ago

Sparky10 commented 2 years ago

When converting content that contains accented characters we find that we run into a number of display issues. This has started happening in v1.9 and was not an issue in v1.8.

Original HTML

<!DOCTYPE html>

col1 col2 şarap and şamaş 0.00 3

V1.8 conversion image

V1.9 conversion image

michaelrsweet commented 2 years ago

@Sparky10 What options are you passing on the command-line?

michaelrsweet commented 2 years ago

FWIW, when I use the default (ISO-8859-1) character set I see this problem but if I specify UTF-8 the right thing comes out:

htmldoc --webpage --charset utf-8 -f FILENAME.pdf FILENAME.html
Sparky10 commented 1 year ago

Thanks Michael, we will check this out and I will update here with results.