joomla / joomla-cms

Home of the Joomla! Content Management System
https://www.joomla.org
GNU General Public License v2.0
4.77k stars 3.65k forks source link

Apostrophe in titles causes alias to be different on Joomla 3 & 4 #35856

Open weeblr opened 3 years ago

weeblr commented 3 years ago

Steps to reproduce the issue

Have an article with a apostrophe in its title, such as Let's go.

Expected result

Generated alias is let-s-go

Actual result

This is what happens in Joomla 3. In Joomla 4, the apostrophe is stripped, the alias is lets-go

Additional comments

I think this is a significant backward compatibility issue that will break a lot of URLs if people upgrade a Joomla 3 site to Joomla 4.

I have seen a number of reported issues with non-latin characters such as #27875 and #35014. Those appeared to be related to transliteration happening in a different way, and the language pack being involved.

This is a bit different in that it happens with stock, en-GB joomla content.

I have not taken the time to look into the code, as this was reported on a French group by others but I tested to see if it appears as well on stock en-GB Joomla. I think this should be modified and reverted to the Joomla 3 behavior, mostly for content B/C reasons and the expected ranking drops if a URL changes unknowingly.

Note that I tested with other usual suspects such as $ or # and Jooml 3 & 4 behavior appears to be the same for these characters.

Has this been reported before?

brianteeman commented 3 years ago

It was a deliberate change

weeblr commented 3 years ago

Care to expand or link to something where this was discussed and we could get an idea of the reasons behind?

(I looked through 18 pages of "alias" related Github issues before posting the above, could not find anything)

infograf768 commented 3 years ago

In any case, the url(s) that were created in J3 will not be modified

weeblr commented 3 years ago

Hi @infograf768 :)

You're right, existing aliases won't be modified, except in edge cases where articles or menu items are deleted and then re-created for instance.

So this likely is not a general B/C break I guess.

brianteeman commented 3 years ago

https://github.com/joomla/joomla-cms/pull/32924

weeblr commented 3 years ago

@brianteeman OK, I see. I can see it kind of works ok for the English language, not so much for french where the apostrophe is often going to be at the start (d'évaluer and dévaluer really are not the same word for instance).

Too late to change now anyway.

brianteeman commented 3 years ago

Not tested but can't the custom transliterate function in a language pack address this https://docs.joomla.org/J3.x:Making_a_Language_Pack_for_Joomla#Example_2_-_Custom_transliteration_implemented

weeblr commented 3 years ago

It should, the call to transliterate() happens before the regular expression that drops anything but latin alphanumeric.

That's likely where the change should have happened, in the en-* localise.php files.

I'll suggest that to the French localization team.

PS: I noticed that your change was done in src/j4/libraries/src/Filter/OutputFilter.php but not in the vendored folder: src/j4/libraries/vendor/joomla/filter/src/OutputFilter.php, so there's likely an inconsistency risk down the line, right?

weeblr commented 3 years ago

OK, this kind of work but won't likely with multilingual sites.

The thing is, the transliterator called is the one corresponding to the current admin language.

So if the backend is displayed in French and you create a French article (language set to FR in the article options), then the transliterate() method in fr-FR.localise.php is used and I can put change there to preserve apostrophe.

However, if the backend is set to English and I create/edit the same article, then the default localise.php file is used, and my custom transliteration for French is not applied.

So for most people I guess adding ' ' => '\'', at the top of the $glyph_array in transliterate() will do the job.

For multilingual sites, they'll be better off adding a transliterate method in localize.php, so that it applies to all items.

Again, I'd think this should be done in per language transliterate but it's too late now for 4.x.

brianteeman commented 2 months ago

Again, I'd think this should be done in per language transliterate but it's too late now for 4.x.

is there anything left to do here or should it be closed