google / fonts

Font files available from Google Fonts, and a public issue tracker for all things Google Fonts
https://fonts.google.com
18.04k stars 2.6k forks source link

Gabarito and Chrome digit problem when exported as pdf from Google workspace #7983

Open creutz opened 1 month ago

creutz commented 1 month ago

Describe the bug

I created a document in Google Sheets, Slides/ Presentations and Documents with text and numbers. When exported as pdf and opened in Chrome, the digits of the document cannot be copied properly - they are copied as random characters (probably not really random but it looks like it to me, maybe its encoding). Strangely, when opening the pdf in Edge the digits get copied properly. I've tested this on Win10 and MacOS Sonoma (Chrome). The digits in Arial or other Google Webfonts (like Open Sans) work fine so I believe it to be related to Gabarito.

To Reproduce

Create any document in Google Drive, write any letters and digits, set font to Gabarito, export as pdf, view in Chrome, try to copy digits.

Expected behavior

The digits get copied correctly.

I installed to Font locally and typing the numbers in a programm like Photoshop they do get copied correctly as well, so I am not sure if it is really related to the font or Google Drive/ Workspace or even Chrome.

kenmcd commented 1 month ago

Please attach the example PDF. It would be useful to take a look at what is in there.

Given that it works properly in other applications it appears it may be an issue in Chrome. Looking at the PDF should help determine if this is the case.

creutz commented 1 month ago

I've reduced a Google Slides example to the respective parts. example.pdf

In Chrome, digits are buggy as in not correctly copied, e.g. "DE00001" is copied as "DEþþþþÿ". In Edge, they are working. Changing the font to Open Sans or Arial, both browsers work.

Edit: its not due to different font weights in this example, I changed all numbers to Open Sans earlier and changed it back for this example but didnt choose the right weight for that. In the original file the weight is the same throughout the file.

kenmcd commented 1 month ago

I think is it is a combination of problems with the web fonts sub-setting, and with Chrome. The PDF appears to be fine. The encoding looks OK. The ToUnicode encoding looks OK. The Unicode codes behind the figures look fine. You can copy-and-paste from many other applications.

But from Chrome you get these alternate characters. Since it does have some consistency - the same replacements for each figure - it does appear to be an encoding issue.

Google Docs is using the web fonts so I took a look at those. latin sub-set from here: https://fonts.gstatic.com/s/gabarito/v7/QGYtz_0dZAGKJJ4t3HtoW4XGm_BJyfk.woff2 latin-ext sub-set from here: https://fonts.gstatic.com/s/gabarito/v7/QGYtz_0dZAGKJJ4t3HtmW4XGm_BJyfnOKg.woff2 And what I found may be part of the problem. Many of the characters are missing Unicode code points. Especially in the latin font. Many common characters such as a,C,c,D,d,E,e, and more, had no code points in the font. Fewer in the latin-ext font, but some are still there. So some bad sub-set web fonts may be part of the problem.

But how that is affecting the PDF encoding for Chrome only is confusing. The embedding encoding is Identity-H - so maybe Chrome is having a problem connecting the dots with this, when other applications seem to be able to do it.

This is now kinda beyond my knowledge and understanding. One of the more knowledgeable GF folks will have to take it from here. Perhaps fixing the web fonts will fix the issue.