selfthinker / dokuwiki_plugin_wrap

Wrap Plugin for DokuWiki: Universal plugin which combines functionalities of many other plugins. Wrap wiki text inside containers (divs or spans) and give them a class (choose from a variety of preset classes), a width and/or a language with its associated text direction.
http://www.dokuwiki.org/plugin:wrap
GNU General Public License v2.0
41 stars 33 forks source link

Improvements to the language and writing direction detection #250

Closed saschaleib closed 1 year ago

saschaleib commented 1 year ago

This code changes the way :lang attributes are handled, allowing more flexibility, including a possible Script specification, as specified in BCP 47.

The direction specification (è.g. dir="rtl") now uses the language code as a default, but allows the script specification to override this when needed.

A side effect of this change is that the additional config file is no longer needed.

saschaleib commented 1 year ago

Hello, and thanks for this extremely useful plugin!

I have spent a bit of time trying to see how this works, as I was trying to improve compatibility of my template with this plugin. In this process, I noticed a couple of issues where I think I could contribute a bit ... So expect a couple more pull requests from me in the near future :-)

First off: I noticed that the code used to determine the dir attribute makes a couple of assumptions about languages that will not always hold true: most of all: the text direction is not a property of a language, but one of the script used. These two only have a clear relationship most of the time, but there are many cases where this breaks down. Please allow me to explain this in a bit more detail:

Firstly, there are languages which use more than one writing system (the technical term is "digraphia"). For example, Serbian can be written either in Latin or in Cyrillic alphabet. Turkish switched the writing system in the 20th century from Arabic to Latin – but there are still many old texts that are written in Arabic. Kurdish can be written in either Arabic, Latin or Cyrillic, etc.

But there is more: if you read a transliteration of a non-Latin text, this is still the same language, but written in a different writing system. To illustrate this, the following are actually two examples that I found in my own wiki:

This also shows the correct way to specify the language in this situation: the script will be added as a four-letter-code after the ISO 639 language code (and any potential other code, like the country, etc.).

At the moment, the language detection would not pass such codes through to the output, so that had to be changed as well. This means, that the change will make it possible to even specify very obscure languages (Wikipedia has this beautiful example of "he-IL-u-ca-hebrew-tz-jeruslm" that even explains the time-zone used, etc.

This means that this change now makes it also possible to specify the region. Remember that just as en-GB is not the same as en-US, the same is true for many other languages, like de-DE vs. de-AT or de-CH, …

I hope you find this change useful, and I will already start looking at some improvements on the semantic markup and CSS ... coming soon ;-)

Best greetings /sascha

5shekel commented 1 year ago

Tested, looks good! https://docs.telavivmakers.space/tamiwiki/external/infra/e-waste

saschaleib commented 1 year ago

Thanks, @5shekel this is indeed a good use case, as your site mixes RTL and LTR. May I interest you to try if my own Ad-Hoc Tags plugin would be an alternative for you? It gives you more flexibility as it also supports the dir attribute which can override the language attribute.

Klap-in commented 1 year ago

Could it be that people use the config file to configure other combinations? so that merging results in not backward compatibility issues? if you expect not, I will merge.

saschaleib commented 1 year ago

Hm, in principle that would be possible, but I think the overhead would not be worth the benefits. I reckon that the built-in list of languages and scripts are now covering most cases, and with the option to override the script this should be pretty much complete.

I should add that I have moved on and made my own plugin which implements (and extends) the attribute handling and other aspects. If there is an interest, I am happy to backport some more features here. :-)