Open khaledhosny opened 9 years ago
First, thanks very much for all your help with the bidi issues; I would not have got there without it! And thanks for your input here - yes, I agree it's all a mess at the moment as it grew rather ad hoc. Would be good to rationalise it.
Starting with the easy one (direction), here's my opinion on how to approach it:
core/font.lua
can simply be deleted. They're not used now anyway after we fixed bidi support.packages/tate.lua
)So:
font.lua
.SILE.documentState
(or perhaps the pageTemplate
) in baseclass.lua
; default to "ltr"
.papersize
in baseclass.lua
for a document-level option setter. We should also do some input verification here. :-)SILE.newFrame
in frame.lua
, copy the document direction property into the frame.Sorry for not responding earlier, got really busy with real life in an unexpected way. The above plan sounds good, and you seem to have implemented (some of?) it, anything remaining in this area for me to work on?
No worries. I'm trying to move towards a 0.9.2 release, (it's been way too long and there've been many major fixes) and the manual didn't build any more so I needed to do a bit of work on it!
Some more thoughts from @deepakjois, taken from PR #78:
Some other points for discussion:
- Given SILE is intended to be a next-gen typesetting package, bidi support should be built-in to one of the default document classes, probably
plain
. We should not need to import a package to get bidi support.- There should be support for typesetting short RTL text inside LTR paragraphs as well (and vice versa), something equivalent to the TeX bidi pacakages
\LRE
and\RLE
I was all set to say that turning on bidi support for absolutely everything imposes a huge overhead on the common case, but then I benchmarked it. It's something like a 10% penalty, which is probably acceptable and computers keep getting faster. So maybe should just always have bidi support on.
If we do that, then do we really need the \LRE
and \LRE
commands (or as we used to have in SILE, \font[direction=...]{}
)? Now direction is inferred automatically, there shouldn't be a need (as far as I can see) to set direction manually within a paragraph. I think the only reason we would now need support for such a thing is if we really wanted to allow people to deliberately typeset text "back to front". I'm sure they could write their own package for that if necessary. :-)
OK, so we add bidi support to everything and then I think direction is done.
Thinking about language: The user needs to select the language manually to activate hyphenation rules, shaping (Urdu vs Arabic etc.) and other language-specific typographic practices. (Japanese line-breaking rules, kerning, etc.) Also in the future we need language-specific document elements ("Chapter..." etc.) - I haven't given very much thought as to how that will work.
So it's clear that (a) the user should be able to select the language, and (b) this really isn't a font property. This is already reflected in the awkwardness that font.lua
contains a bunch of font.whatever
settings and then a document.language
setting---it's obviously the odd one out. But the reason it's in there is that language needs to be passed to the shaper, and also that when you change the language you may also want to change the script and the font as well so it feels user-friendly to do that as a single command. (But that can obviously be finessed in higher-level packages and commands later.)
Because it doesn't do any harm to have language in the font setting (it's just a little inconsistent), I don't think we'll put this in the 0.9.2 release. (Incidentally, the master
branch is now preparing for release, and work towards 0.9.3 is temporarily going on in the devel
brach; this will be merged into master after release.)
My suggestion would be:
document.language
setting to text.language
; move the setting declaration from font.lua
into languages.lua
.\text
command in languages.lua
to manipulate this setting.font
command but add a deprecation warning saying that it will go away in 0.9.4 and recommending the use of \text
instead.Should be a nice easy job for someone. :-)
I think script is OK as is. In the vast majority of cases the user doesn't need to specify it, and the only thing that happens with it is that it is passed straight to Harfbuzz to give the user more control over how shaping happens. I can't think that we would want to do anything other with it. Since Harfbuzz does the right thing most of the time, I don't think SILE needs to implement UAX 24.
If we do that, then do we really need the \LRE and \LRE commands (or as we used to have in SILE, \font[direction=...]{})?
RTL inside LTR text (and the reverse) should just work now, and in the odd case where you want to use a different base direction for the subtext one can use BiDi control characters like U+202A LEFT-TO-RIGHT EMBEDDING
, U+202D LEFT-TO-RIGHT OVERRIDE
, etc. We can have short hand \LRE
, \LRO
commands that simplifies entering them.
(BTW, we need to update the UBA implementation to support the Unicode 6.3 additions.)
I agree with the proposal above and will try to work on it, but I don’t think we need to deprecate language support in font, it can be useful when you want to use a different font language than the text language (for a badly design or incomplete fonts).
I don’t agree here. HarfBuzz’s script detection is very simple and does not help with the characters with common script property case. Take for example this string:
ع ab (aa) cd ع
without proper script detection, the parenthesis will be assigned Arabic script. This might be OK for most fonts, but if a font have different, say, substitutions for the parenthesis based on the script, you will get the wrong substitution here. See for example this old version of Amiri Slanted, first is the wrong script detection and the second is the right one (I had to drop this feature because many applications were not handling this properly and I know just use upright parenthesis):
On reflection I think you are right. From a user's perspective I think we would like to support the following:
The last item requires script detection, and the first two require separate commands. Please feel free to implement any or all of this. :-) I am focusing on trying to get Japanese working according to JIS X 4051 / W3C requirements...
Scripts
As seen in #1726, script detection is needed in TTB cases too.
The current handling of the 3 properties is sub-optimal and can be improved:
direction
key sets the base direction (i.e. whether the paragraph is mainly LTR or RTL).language
anddirection
would then be keys of that command.language
for similar reason, but in addition to the proposed command not instead of it.bidi
package should switch to this new model.I’m looking for opinions about these proposed changes and code pointers to implement them.