Open lueck opened 10 months ago
I'm facing several technical issues/limitations and I'm a bit stuck. See for example https://tex.stackexchange.com/questions/686767/process-hbox-with-luaotfload and also https://github.com/harfbuzz/harfbuzz/pull/3762#issuecomment-1531726473. Some others are related to the fonts, which sometimes don't seem to take into account the kashida (with clearly misplaced diacritics).
I’ll read carefully your looong report (I wish they were all like that 🙂). There are some explanations here:
The horizontal placement of diacritics is under the direct control of babel
, and I was working on an option to set it (start, center, end).
https://latex3.github.io/babel/guides/non-standard-hyphenation-with-luatex.html
Thanks! That enables me to make more informed experiments.
With the following transformation rules, the horizontal displacement of diacritcs is solved using 1-letter rules:
; insert kashida into pattern with certain consonant combinations
kashida.plain.1.0 = { ()[يئهشسقفغعضصنمكلظطخحجثتب]()[يئهشسقفغعضصنمكلظطخحجثتباأإآوؤذدزرة] }
kashida.plain.1.1 = { kashida = 500 }
; one diacritic mark: insert kashida behind it
kashida.plain.2.0 = { [يئهشسقفغعضصنمكلظطخحجثتب]()[ًٍَُِّ]()[يئهشسقفغعضصنمكلظطخحجثتباأإآوؤذدزرة] }
kashida.plain.2.1 = { kashida = 500 }
; two diacritic marks: insert kashida behind them
kashida.plain.3.0 = { [يئهشسقفغعضصنمكلظطخحجثتب][ًٍَُِّ]()[ًٍَُِّ]()[يئهشسقفغعضصنمكلظطخحجثتباأإآوؤذدزرة] }
kashida.plain.3.1 = { kashida = 500 }
kashida.plain.4.0 = { ()ل()[ًٍَُِّ]*[اأإآ] }
kashida.plain.4.1 = { kashida = 0 }
But in the output, the kashida is displaced vertically:
.. so the y-axis-offset should, that results from lifting diacritics, should be reset before inserting kashida (and maybe restored afterwards).---I can't guarantee, that this is a TeX-like formulation of a fix...
With the changes from my kashida-after-diacritics
branch, I now get a result for my case 1, which I am happy with:
If you would rather keep the kashida.plain
transform as it is, I would suggest to make this to an alternative transform called kashida.after.diacritics
.
Should I open a PR?
Hm, with other fonts in still get bad results where the kashida is shifted above the baseline for some character combinations.
I'm somewhat busy right now. Allow me a week or so.
@jbezos No problem! Sorry for mixin in #243 and writing such a cumulative issue. Also my \case{4}...
should be an other issue, see #258.
I managed to get very fine results in the meantime.
In order to leave kashida.plain
as it is, I made another branch where I added justification rules named kashida.afterdiacritics.plain
. I also squashed my suggested changes to babel.dtx
into one commit in order to make it more comprehensible.
By default, the logic of kashida insertion is unchanged. Only with \directlua{Babel.arabic.kashida_after_diacritics = true}
the creation of the node for a kashida is changed, so that it can be placed correctly.
This is the result I get with Babel.arabic.kashida_after_diacritics = false
:
And this is the result I get with Babel.arabic.kashida_after_diacritics = true
. If you look very carefully, I'll notice that the kashida is not always at the same y offset, which is a feature of this font.
New cases 5 and 6:
\case{5}{للشُهْبِ}{لـلــشُـهْبِ}{Kashida 3 displaced}
\case{6}{تَأَصَّلَ}{تَـأَصَّـلَ}{Kashidas with bad x and y offset.}
With the changes from my
kashida-after-diacritics
branch,
Thanks. I’m reading the code and there is a point that can mislead and should be clarified. FreeSerif doesn’t use the PUA, but luaotfload
, mainly as a trick to access glyphs without a Unicode point. Relying on what luaotfload
does internally isn’t safe.
The problem is that in the justification step, the node list often contains these PUA codes, the exact meaning of which is often unknown. This is one of the technical issues/limitations I was talking about.
I’ll work on some of your ideas. The new transform can be useful in ‘plain’ fonts, not involving ligatures, but with the latter it’s still an unsolved issue, except by creating rules specific to a font. For the JALT table I devised a hack based on parsing twice some frequent cases, with the normal form and the elongated one, but it’s basically a proof of concept that can’t go very far (and it only works with Sakkal Majalla, and not quite – again diacritics is the problem).
The vertical positioning of tashkil is not (usually) fixed, and they are shifted by the font depending on the character. I was working on something similar to the JALT variants to catch the correct yoffset
(and xoffset
, actually) with kashida, but it seems some (many) fonts don’t bother to deal with kashida and they are clearly misplaced (kasrah is usually too low).
Your transform are now available (in version 3,94), with name kashida.base
:
https://latex3.github.io/babel/news/whats-new-in-babel-3.94.html#new-transform-for-kashida
Thanks for your great effort on the kashida feature! I know, that it's still in experimental state. I'd like to point to some issues.
Here's a MWE with an analytic tool for investigating the input characters (for helping people like me who have to typeset Arabic but can't read it). It uses
\makebox[LENGTH][s]{TESTCASE}
for forcing kashida elongation on single words for demonstration.TEX engine: LuaHBTeX, Version 1.17.0 (TeX Live 2023)
babel version: 2023/08/09 v3.92.22182 The Babel package (from github)
The output per test case is (from right to left): Number, input, expecation, result (box).
I tried to fix this by changing
to
where the second
()
is moved behind the regex for the diacritics[]*
. But this makes the diacritic disappear, when a kashida is inserted behind the consonant the FATHA refers to.I also tried special rules for consonant+vowel combinations like
But again, the effect is that the FATHA disappears. So, I guess, we need 2-letter and 3-letter rules for getting this right. Somehow like below, but I don't know the syntax for 2 and 3 letter rules.
This can be fixed by adding the kashida into the first regex character class:
try this which makes diacritics go away (too bad!) and kashidas homogenous:
ArabicTypesetting
, I get kashidas at the end of a word for some letters.Could you point be to a documentation of transformation rules?