mundschenk-at / php-typography

A PHP library for improving your web typography.
GNU General Public License v2.0
68 stars 6 forks source link

Unintended hyphen-dash-replacement #147

Closed animaux closed 4 months ago

animaux commented 2 years ago

The german string Kunst-, Kultur- und Architekturgeschichte is rendered Kunst–, Kultur- und Architekturgeschichte, the first hyphen is replaced with an en-dash, but should remain a hyphen. Another example is the german spelling of email —> E-Mail which unintendedly becomes E–Mail.

Version is 6.0 (in Craft Typogrify plugin)

mundschenk-at commented 2 years ago

Thanks, this one is a bit tricky because we can't interpret the conjunction. I'll see if I can fix this issue without breaking anything selse (E–Mail is definitely a bug, but the list might be a ”damned if you do, damned if you don't” situation. There is only so much you can do via regexes.

mundschenk-at commented 2 years ago

@animaux, do you really mean version 6.0.0 of PHP-Typography? If you, please try updating to a recent version and check if this still occurs.

animaux commented 2 years ago

Thank you! I understand ;)

As for the version. I thought this was the version the typogrify plugin for craft cms uses. I don’t know much about composer, but there is this in the composer.json:

"require": {
    …
    "mundschenk-at/php-typography": "^6.0"
  },

Or does it mean »Version 6.0 and upwards?«

mundschenk-at commented 2 years ago

Yes, that means "any 6.x version". You should be able to see the actual version in the composer.lock file (should 6.6.0).

animaux commented 2 years ago

Thanks for the pointer! composer.lock lists version 6.6.0.

mundschenk-at commented 2 years ago

Regarding E-Mail: Is that result above an c&p or manually retyped? Because you should see E‑Mail, which is the non-breaking hyphen. If you really do get an – there, I need a standalone example (so that I know the exact settings and we can rule-out any post-processing by Craft or the plugin).

Same for the Kunst-, Kultur- und Architekturgeschichte, the first - is replaced with ‑ to prevent a break before the ,.

animaux commented 2 years ago

I think it must have been Copy & Paste. This one is for sure: Kunst‑, Kultur- und Architekturgeschichte. Checking it directly via decodeunicode it returns a U+02011 NON-BREAKING HYPHEN after Kunst and a U+0002D HYPHEN-MINUS after Kultur. So the latter is actually not an endash, but it's evil twin!

I can’t reproduce the endash in E-Mail anymore, so I think it must have been a copy & paste error. Sorry for that.

There’s a good chance something else is messing with the text, since the craft typogrify extension uses several libraries, including smartypants, I think. I will put together a testcase for this and the other issue tomorrow. Thanks for looking into this!

animaux commented 2 years ago

Here’s a simple test-case: https://gist.github.com/animaux/ae3e2a96f886f8dedb3ca736194e84a0

Test-source-html:

<p>Die Stiftung Preußische Schlösser und Gärten Berlin-Brandenburg (SPSG) betreut heute die schönsten und bedeutendsten Zeugnisse der Kunst-, Kultur- und Architekturgeschichte in Brandenburg-Preußen.</p><p>E-Mail.</p><p>Seit Januar 2018 sind diese Zeugnisse herzoglicher Repräsentation in einer landeseigenen Kulturinstitution – den Staatlichen Schlössern, Gärten und Kunstsammlungen Mecklenburg-Vorpommern (kurz: SSGK M-V) – zusammengefasst.</p><p>UNESCO-Welterbestätten UNESCO Welterbestätten.</p>

Results are as follows:

One issue remains (that may be intentional?):

animaux commented 2 years ago

Feel free to close this, as the UNESCO-Welterbestätten-Problem can be solved by using a NON-BREAKING HYPHEN in the source text.

Thanks for your help and effort!

mundschenk-at commented 4 months ago

I have created a new ticket for the caps issue (#174).