qgis / QGIS

QGIS is a free, open source, cross platform (lin/win/mac) geographical information system (GIS)
https://qgis.org
GNU General Public License v2.0
10.34k stars 2.98k forks source link

Character encoding issue when exporting Calibri labels to PDF #35795

Open mmdolbow opened 4 years ago

mmdolbow commented 4 years ago

Describe the bug When adding a layout label to QGIS 3, if you use the Calibri font and don't render as HTML, words like "attend" and "mention" (anything with a "tt" or "ti") will be improperly encoded when exported to a PDF. This can be an accessibility issue.

How to Reproduce

  1. Open QGIS, create a layout
  2. Add a label to the layout. Use Calibri as the font. Do NOT turn on "render as HTML" option.
  3. Make sure to add words with "tt" or "ti" to the label
  4. Export to PDF
  5. Run Accessibility Full Check in Adobe Acrobat Pro. Under "Page Content", you will see "Character encoding - Failed" and the words with "tt" and "ti" will be shown. Screen readers will misread these words.

QGIS and OS versions QGIS version 3.10.1-A Coruña QGIS code revision ef24c526da Compiled against Qt 5.11.2 Running against Qt 5.11.2 Compiled against GDAL/OGR 3.0.2 Running against GDAL/OGR 3.0.2 Compiled against GEOS 3.8.0-CAPI-1.13.1 Running against GEOS 3.8.0-CAPI-1.13.1 Compiled against SQLite 3.29.0 Running against SQLite 3.29.0 PostgreSQL Client Version 11.5 SpatiaLite Version 4.3.0 QWT Version 6.1.3 QScintilla2 Version 2.10.8 Compiled against PROJ 6.2.1 Running against PROJ Rel. 6.2.1, November 1st, 2019 OS Version Windows 10 (10.0) Active python plugins changeDataSource; QuickOSM; db_manager; MetaSearch; processing

Additional context This has been found with two different user profiles. Since workarounds exist, this bug should not be considered a high-priority, in my opinion. Mostly I'm logging it so others are aware of the workarounds (and who knows, maybe it's an easy fix).

Demonstration Files Files showing the problem and workarounds are in https://github.com/mmdolbow/pdf_map_accessibility/tree/master/qgis/etc

jgrocha commented 4 years ago

Thanks you @mmdolbow for the detailed report. I think this is a Acrobat Pro bug, not a QGIS one.

As far as I know, these consecutive letters are replaced by a proper ligature for improved visualization. That's a really cool feature that started in TeX, more than 30 years ago. Check this related issue #32733, where another user asks for better OpenType support in QGIS.

The most sensible option would be to add a checkbox to use ligatures or not in the exported text. That's a feature request.

It also make sense to report a bug to Acrobat Pro, if it can't read the generated text. It is properly formatted and encoded, but maybe Acrobat Pro does not support ligatures. Can you check if Acrobat Pro supports ligatures?

mmdolbow commented 4 years ago

Thanks so much for the reply. After some research, it looks like Adobe does support ligatures in general. But there are several issues and bugs reported, like this one. I will try to file a specific bug with them. Since there are workarounds for this, I would support this issue being closed.

mmdolbow commented 4 years ago

Of course, I can't replicate this issue with any other product (Word, OneNote, etc).

mmdolbow commented 4 years ago

I'm sorry, can we consider reopening this? I am working with my org to report a bug to Adobe, but while doing that, I tried to replicate the issue with several other programs. I tried Excel, Word, Paint, ArcGIS Pro, OneNote, and even HTML printed to PDF via Chrome and Firefox. NONE of them produced a PDF that had the same encoding issue. So that is making me second guess the conclusion that it is a bug in Adobe, especially when they say the support ligatures. Unless someone can propose a program that works similarly to QGIS that I can export from to replicate the issue, I have a feeling Adobe will just say this is a problem with QGIS.

mmdolbow commented 4 years ago

Just a followup to confirm that I opened up a support call with Adobe and they indicated to me that their engineering team tested this and that they thought it was an issue on the QGIS side.

Pedro-Murteira commented 2 years ago

Still valid on QGIS 3.22.3. The screen reader fails to read words containing the mentioned letters. I don't have the pro version of this software to test point 5. of this issue. @mmdolbow Did you come across this issue more recently? thank you.

mmdolbow commented 2 years ago

Hi @Pedro-Murteira I have not tested this recently, no. Haven't upgraded QGIS in a while, and used the workarounds in the meantime.

alexbruy commented 11 months ago

As QGIS uses Qt for PDF export it might be a Qt bug.

mmdolbow commented 11 months ago

As QGIS uses Qt for PDF export it might be a Qt bug.

@alexbruy that's interesting. Any idea how we can test and/or explore that? Note to @Pedro-Murteira - I'm on QGIS 3.28.7 now, and recent PDFs still have the issue. Generally I only have the characters in my legend which just gets marked as an image or bundled with the map as an image, so I ignore it.

alexbruy commented 11 months ago

Any idea how we can test and/or explore that?

There is a QPdfWriter class (https://doc.qt.io/qt-5/qpdfwriter.html) in Qt. It should be possible to create a simple program to generate PDF with ligatures and see generated file works as expected.