latex3 / babel

The multilingual framework to localize LaTeX, LuaLaTeX and XeLaTeX
https://latex3.github.io/babel/
LaTeX Project Public License v1.3c
130 stars 35 forks source link

Beta code support for Greek in XeTeX #106

Closed Davislor closed 3 years ago

Davislor commented 3 years ago

Seeing that the textgreek command has been removed gave me the impetus to add support for beta code, the standard for digitizing ancient Greek since the 1970s. There’s a package for it already, betababel, but it was last updated in 2015 and only supports PDFTeX.

I wrote a Mapping=greek-betacode file for XeTeX and a xebetacode.sty file that uses it. To compile the .map file into a .tec file, run teckit_compile greek-betacode.map -o greek-betacode.tec, and put greek-betacode.tec somewhere that XeTeX searches.

The package does not actually depend on Babel in any way, but it would normally be used with either Babel or Polyglossia. I hope it will be of interest to people here, and I didn’t know a better place to post it.

The formal spec for beta code is by the Thesaurus Linguae Graecae, here and here. What I implemented is a subset consisting of the entire Greek alphabet plus a selection of other symbols. It supports the combination of dialytika with varia, oxia or perispomeni, and has one major extension: it adds ` as an alternative to \ to denote a grave accent. This allows a non-verbatim beta code environment.

During testing, I noticed that Babel’s ancient/polytonic Greek language files do not support combining accents. The hyphenation algorithm will happily break a line within a grapheme and leave several orphaned accents in the margin of the next line. This violates the requirement in the Unicode standard that canonically-equivalent encodings “should always have the same visual appearance and behavior.” (version 13.0, TR 15, section 1.1). I don’t know how difficult that would be to fix, but I was able to work around it by having the map file normalize to NFC form. Some other language files, such as French, will also break the parsing by setting a character active.

xebetacode-0.1.zip

jbezos commented 3 years ago

I think this issue is closely related to #107. Does \XeTeXinputnormalization solve it? Does it require an action from my part?

Davislor commented 3 years ago

Like I said, the mapping file in the ZIP archive I attached already normalizes it (although I learned about the better ways to normalize by posting #107). I also had a user named Cicada message me on TeX.SX and propose a solution with regular expressions in expl3 that would also work on LuaTeX. If you’re interested, I could turn this into a package that could replace the obsolete \textgreek in Babel with the standard ASCII code in use since the 1970s.

jbezos commented 3 years ago

I haven't managed to make it work for me, but in any case I can have an idea of what it does. And I think anything that makes life easier is welcome.

antonistsolomitis commented 3 years ago

I have a math book written in polytonic Greek using the babel mapping ascii to greek. I would like to republish the book (new edition) but I would like to do it with a modern otf font. Is this possible with either xelatex or lualatex? For example, can a similar file to xebetacode be written for the standard babel ascii-to-greek transliteration? Does it maybe already exist? Thank you.

Davislor commented 3 years ago

It could be done (that would be a matter of cross-referencing the LGR font table with ASCII), but I’d seriously look at converting the source to Unicode, if feasible.

antonis-tsolomitis commented 3 years ago

It is a 500 pages book. How can I convert it....? It is too much work that is why I was looking to your solution for betacode. I do not think a script can work either because of Math. What would work is a TeX parser that would understand what is text, what is math and what is commands and environments. If no other solution can be found could you give me some information of how to modify your xebetacode package? I am not a programmer but I will manage if I have the correct information.

Davislor commented 3 years ago

Well, it’s possible to parse the source, find all the Greek text, and convert it when you compile the document. Is the Greek text always set inside a \textgreek{} command, possibly containing macros like \textgreek{\textbf{Ellhnika}}, where \textgreek and \fontencoding blocks are never nested recursively? if so, it ought to be feasible to automatically process the source with regular expressions.

The traditional LGR mapping isn’t the same as beta code, especially for polytonic Greek, but it’d be possible to do a different one, or a solution for LuaLaTeX.

Davislor commented 3 years ago

@antonis-tsolomitis Or, would you be willing to pay me to do it?

antonis-tsolomitis commented 3 years ago

Thank you for your answer. The book is of a colleague who I am helping with TeX. So I will forward this to him. I will let you know if he decides to do it.---thanks a lot.

jbezos commented 3 years ago

Here is a test file I've written with a partial and quick set of rules for the refactoring of \babelprehyphenation (luatex), which is almost finished.


\documentclass{article}

\usepackage{babel}

\babelprovide[import=el]{betagreek}

\babelfont{rm}{CMU Serif}

% {)} => %) in lua.

\babelprehyphenation{betagreek}{ ([ahiuw]) = }{
  string = {1|ahiuw|ᾶῆῖῦῶ},
  remove
}

\babelprehyphenation{betagreek}{ ([aehiouw]) {)} / }{
  string = {1|aehiouw|ἄἔἤἴὄὔὤ},
  remove, remove
}

\babelprehyphenation{betagreek}{ ([aehiouw]) {(} }{
  string = {1|aehiouw|ἁἑἡἱὁὑὡ},
  remove
}

\babelprehyphenation{betagreek}{ ([aehiouw]) {)} }{
  string = {1|aehiouw|ἀἐἠἰὀὐὠ},
  remove
}

\babelprehyphenation{betagreek}{ ([aehiouw]) / }{
  string = {1|aehiouw|άέήίόύώ},
  remove
}

\babelprehyphenation{betagreek}{([abgdezhqiklmncoprstufxyw])}{
  string = {1|abgdezhqiklmncoprstufxyw%
             |αβγδεζηθικλµνξοπρστυφχψω}
}

\begin{document}

\selectlanguage{betagreek}

*ou)k e)/stin ou)de`n deino`n w(=d' ei)pei=n e)/pos ou)de` pa/qos ou)de`
cumfora` qeh/latos, h(=s ou)k a)`n a)/rait' a)/xqos a)nqrw/pou fu/sis.
o( ga`r maka/rios—kou)k o)neidi/zw tu/xas—*dio`s pefukw/s, w(s
le/gousi, *ta/ntalos korufh=s u(perte/llonta deimai/nwn pe/tron a)e/ri
pota=tai: kai` ti/nei tau/thn di/khn, w(s me`n le/gousin, o(/ti qeoi=s
a)/nqrwpos w)`n koinh=s trape/zhs a)ci/wm' e)/xwn i)/son, a)ko/laston
e)/sxe glw=ssan, ai)sxi/sthn no/son. ou(=tos futeu/ei *pe/lopa, tou= d'
*)atreu`s e)/fu, w(=| ste/mmata ch/nas' e)pe/klwsen qea` e)/rin,
*que/sth| po/lemon o)/nti suggo/nw| qe/sqai. ti/ ta)/rrht'
a)nametrh/sasqai/ me dei=; e)/daise d' ou)=n nin te/kn' a)poktei/nas
*)atreu/s. )atre/ws de/: ta`s ga`r e)n me/sw| sigw= tu/xas: o(
kleino/s, ei) dh` kleino/s, *)agame/mnwn e)/fu mene/lew/s te *krh/sshs
mhtro`s *)aero/phs a)/po. gamei= d' o(` me`n dh` th`n qeoi=s
stugoume/nhn mene/laos *(ele/nhn, o(` de` *klutaimh/stras le/xos
e)pi/shmon ei)s *(/ellhnas *)agame/mnwn a)/nac: w(=| parqe/noi me`n
trei=s e)/fumen e)k mia=s, xruso/qemis *)ifige/neia/ t' *)hle/ktra t'
e)gw/, a)/rshn d' *)ore/sths, mhtro`s a)nosiwta/ths, h(` po/sin
a)pei/rw| peribalou=s' u(fa/smati e)/kteinen: w(=n d' e(/kati,
parqe/nw| le/gein ou) kalo/n: e)w= tou=t' a)safe`s e)n koinw=|
skopei=n. foi/bou d' a)diki/an me`n ti/ dei= kathgorei=n; pei/qei d'
*)ore/sthn mhte/r' h(/ sf' e)gei/nato ktei=nai, pro`s ou)x a(/pantas
eu)/kleian fe/ron. o(/mws d' a)pe/ktein' ou)k a)peiqh/sas qew=|: ka)gw`
mete/sxon, oi(=a dh` gunh/, fo/nou. pula/dhs q', o(`s h(mi=n
sugkatei/rgastai ta/de.

\end{document}