Closed gmilde closed 1 year ago
The "usrguide" states:
The input given to these commands is ‘expanded’ before case changing is applied. This means that any commands within the input that convert to pure text will be case changed.
However, LICR input and literal input are handled differently regarding the Greek uppercase rules.
Indeed, that's exactly because they are expanded in a way that tries to retain as far as possible the user's choice of representation. If you try for example
\documentclass{article}
\usepackage[greek]{babel}
\makeatletter
\protected@edef\foo{\'\textalpha}\show\foo
you get \'\textalpha
, which is not the same at all as ά
(at least from a token-handling point of view). So the code that's set up to deal with Greek accents doesn't see the LICR version at all.
Whilst we could arrange to look for all of the combinations (\'α
, etc. as well as \'\textalpha
), I am not sure that is the best approach. You end up with two code paths for the same ideas, and that's asking for subtle differences. (That was the reason I moved from having separate 8-bit and Unicode handling in the first place.) We could also do a 'partial' expansion of e.g. \textalpha
to α
within Greek blocks, but I am worried that is error-prone.
The wider point is not just an issue for case changing. If you want to write e.g. PDF bookmarks, you need the 'pure text' equivalent of the input, which is currently handled as a separate step (and needs data which is a bit spread out). Similarly, if someone wants to map over the graphemes in some text, they'd also face the same issues. So we do need a mechanism to convert the LICRs, it's a question of where it sits.
I suspect that the best long-term fix is to adjust the 'text expansion' code such that LICRs that can be mapped to Unicode codepoints are. There would remain some issue with those that require combining chars, as they can't neatly be handled in 8-bit engines. (I suspect there an engine-dependent pathway is more-or-less inevitable.)
However, this is quite a significant change at a policy level, so I'd like wider input.
In favour of the status quo for treatment of LICRs in expansion is that one can't (at present) be sure that LICR -> Unicode will round-trip. That's fine for \text_purify:n
, but not for case changing, grapheme mapping, etc., as we likely will need to typeset the result and this could rely on the LICR. (That's on top of the combining chars issues.)
A selective 'more expansion' mechanism would presumably not have that issue, as it would be clearly opt-in and so it would be reasonable to assume round-tripping.
I managed to solve the Greek upcasing for LICR and literal characers with short accents in about 120 code lines. The following example combines the code required on top of an current TL23 in the preamble and a test document.
% Backrolling does not work for \MakeUppercase (cf. LaTeX News 35)
% \RequirePackage{latexbug}
%\RequirePackage[2022-05-01]{latexrelease}
\documentclass[a4paper]{article}
\usepackage[LGR,T1]{fontenc}
\usepackage{lmodern}
\ifdefined \UnicodeEncodingName
\usepackage{fontspec}
\setmainfont{FreeSerif}
\newcommand*{\texengine}{Xe/LuaLaTeX}
\else
\usepackage{lmodern}
\newcommand*{\texengine}{pdfLaTeX}
\fi
% Load encoding definitions
\usepackage[normalize-symbols]{textalpha} % "Greek script everywhere"
% With TL22, the special handling of Greek UPPERCASE is only triggered
% if the text language is set to "greek" with Babel:
%
\usepackage[greek,english]{babel} % babel-greek
% \usepackage[greek,english,provide=*]{babel} % Babel's Greek "ini"
\languageattribute{greek}{polutoniko} % "modern" polytonic Greek
% \languageattribute{greek}{ancient}
\usepackage[unicode,colorlinks,linkcolor=blue]{hyperref}
\usepackage{bookmark}
% Auxiliary commands
\newcommand{\langGreek}{\foreignlanguage{greek}}
% print the selected language variant
\newcommand{\GreekLanguageVariant}{%
\ifx\captionsgreek\captionspolutonikogreek
\ifx\captionsgreek\captionsancientgreek
ancient%
\else
polutoniko%
\fi
\else
monotoniko%
\fi
}
% workaround for MakeUppercase:
\makeatletter
% for textalpha.sty (already present in 2.4dev)
\ifdefined\DeclareCaseChangeEquivalent % new in 2023
\DeclareCaseChangeEquivalent{\<}{\accdasia}
\DeclareCaseChangeEquivalent{\>}{\accpsili}
\fi
% for Babel (already present for LGR in 1.13.2)
\IfFormatAtLeastTF{2022/06/01}%
{\DeclareTextCommandDefault{\accACUTE}{\@tabacckludge'}
\DeclareTextCommandDefault{\accGRAVE}{\@tabacckludge`}
\DeclareTextCommandDefault{\accTILDE}{\@tabacckludge~}
\addto\@uclclist{\'\accACUTE \`\accGRAVE \~\accTILDE}%
}%
{}%
\ifdefined \UnicodeEncodingName
\IfFormatAtLeastTF{2022/06/01}{%
% already in greek-fontenc 2.3.dev
\DeclareTextCompositeCommand{\LGR@hiatus}{TU}{'}{\LGR@hiatus}
\DeclareTextCompositeCommand{\LGR@hiatus}{TU}{`}{\LGR@accdropped}
% for tuenc-greek.def
\DeclareTextCommand{\accACUTE}{TU}{\@tabacckludge '}
\DeclareTextCompositeCommand{\accACUTE}{TU}{"}{\accdialytika}
\DeclareTextCompositeCommand{\accACUTE}{TU}{>}{\LGR@hiatus}
\DeclareTextCompositeCommand{\accACUTE}{TU}{\textAlpha }{\LGR@A@hiatus}
\DeclareTextCompositeCommand{\accACUTE}{TU}{\textEpsilon}{\LGR@E@hiatus}
\DeclareTextCompositeCommand{\accACUTE}{TU}{\textEta }{Η}
\DeclareTextCompositeCommand{\accACUTE}{TU}{\textIota }{Ι}
\DeclareTextCompositeCommand{\accACUTE}{TU}{\textOmicron}{Ο}
\DeclareTextCompositeCommand{\accACUTE}{TU}{\textUpsilon}{Υ}
\DeclareTextCompositeCommand{\accACUTE}{TU}{\textOmega }{Ω}
\DeclareTextCompositeCommand{\accACUTE}{TU}{Α}{\LGR@A@hiatus}
\DeclareTextCompositeCommand{\accACUTE}{TU}{Ε}{\LGR@E@hiatus}
\DeclareTextCompositeCommand{\accACUTE}{TU}{Η}{Η}
\DeclareTextCompositeCommand{\accACUTE}{TU}{Ι}{Ι}
\DeclareTextCompositeCommand{\accACUTE}{TU}{Ο}{Ο}
\DeclareTextCompositeCommand{\accACUTE}{TU}{Υ}{Υ}
\DeclareTextCompositeCommand{\accACUTE}{TU}{Ω}{Ω}
\DeclareTextCompositeCommand{\accdialytikatonos}{TU}{\textIota}{Ϊ}
\DeclareTextCompositeCommand{\accdialytikatonos}{TU}{\textUpsilon}{Ϋ}
\DeclareTextCompositeCommand{\accdialytikatonos}{TU}{Ι}{Ϊ}
\DeclareTextCompositeCommand{\accdialytikatonos}{TU}{Υ}{Ϋ}
\DeclareTextCommand{\accGRAVE}{TU}{\@tabacckludge`}
\DeclareTextCompositeCommand{\accGRAVE}{TU}{"}{\accdialytika}
\DeclareTextCompositeCommand{\accGRAVE}{TU}{>}{\LGR@accdropped}
\DeclareTextCompositeCommand{\accGRAVE}{TU}{\textAlpha }{Α}
\DeclareTextCompositeCommand{\accGRAVE}{TU}{\textEpsilon}{Ε}
\DeclareTextCompositeCommand{\accGRAVE}{TU}{\textEta }{Η}
\DeclareTextCompositeCommand{\accGRAVE}{TU}{\textIota }{Ι}
\DeclareTextCompositeCommand{\accGRAVE}{TU}{\textOmicron}{Ο}
\DeclareTextCompositeCommand{\accGRAVE}{TU}{\textUpsilon}{Υ}
\DeclareTextCompositeCommand{\accGRAVE}{TU}{\textOmega }{Ω}
\DeclareTextCompositeCommand{\accGRAVE}{TU}{Α}{Α}
\DeclareTextCompositeCommand{\accGRAVE}{TU}{Ε}{Ε}
\DeclareTextCompositeCommand{\accGRAVE}{TU}{Η}{Η}
\DeclareTextCompositeCommand{\accGRAVE}{TU}{Ι}{Ι}
\DeclareTextCompositeCommand{\accGRAVE}{TU}{Ο}{Ο}
\DeclareTextCompositeCommand{\accGRAVE}{TU}{Υ}{Υ}
\DeclareTextCompositeCommand{\accGRAVE}{TU}{Ω}{Ω}
\DeclareTextCompositeCommand{\accGRAVE}{TU}{Ω}{Ω}
\DeclareTextCommand{\accTILDE}{TU}{\@tabacckludge~}
\DeclareTextCompositeCommand{\accTILDE}{TU}{"}{\accdialytika}
\DeclareTextCompositeCommand{\accTILDE}{TU}{>}{\LGR@accdropped}
\DeclareTextCompositeCommand{\accTILDE}{TU}{<}{\LGR@accdropped}
\DeclareTextCompositeCommand{\LGR@hiatus}{TU}{Α}{\LGR@A@hiatus}
\DeclareTextCompositeCommand{\LGR@hiatus}{TU}{Ε}{\LGR@E@hiatus}
\DeclareTextCompositeCommand{\LGR@accdropped}{TU}{'}{\LGR@accdropped}
\DeclareTextCompositeCommand{\LGR@accdropped}{TU}{`}{\LGR@accdropped}
\DeclareTextCommand{\accpsilioxia}{TU}[1]{#1\char"0313\relax\char"0301\relax}
}{} % end IfFormatAtLeastTF
\fi
\makeatother
% -----------------------------------------------------------------------
\begin{document}
\title{Test case conversions of Greek letters}
\author{Günter Milde}
\maketitle
\tableofcontents
\abstract{
This document tests the combination of \verb|MakeUppercase| and Greek.
\makeatletter
It is compiled with \texengine, format version \fmtversion{} patch-level
\patch@level{} and the L3 programming layer from \ExplFileDate{}.
The \verb|\greekfontencoding| is \greekfontencoding.
\makeatother
}
\section{short accent macros}
This section compares literal Unicode Greek characters to characters input
using LICR macros.
Accents on Latin letters must be kept:
\'a \`a \~a → \MakeUppercase{\'a \`a \~a}
\subsection{Greek and Coptic}
\newcommand{\GreekAndCoptic}{% only characters supported by LGR
\raggedright
΄ ΅ Ά · Έ Ή Ί ␣ Ό ␣ Ύ Ώ \\
\'{ } \"'{ } \'\textAlpha{} \textanoteleia{}
\'\textEpsilon{} \'\textEta{} \'\textIota{}
␣ \'\textOmicron{} ␣ \'\textUpsilon{} \'\textOmega{} \\
ΐ Ω Ϊ Ϋ ά έ ή ί \\
\"'\textiota{}
\textOmega{} \"\textIota{}
\"\textUpsilon{} \'\textalpha{} \'\textepsilon{} \'\texteta{}
\'\textiota{} \\
ΰ
ϊ ϋ ό ύ ώ ␣ \\
\'"\textupsilon{}
\"\textiota{}
\"\textupsilon{} \'\textomicron{} \'\textupsilon{} \'\textomega{} ␣\\
}
No case change:
\begin{quote}
\selectlanguage{greek}
\GreekAndCoptic
\end{quote}
%
MakeUppercase:
\begin{quote}
\selectlanguage{greek}
\MakeUppercase{\GreekAndCoptic}
\end{quote}
%
MakeLowercase:
\begin{quote}
\selectlanguage{greek}
\MakeLowercase{\GreekAndCoptic}
\end{quote}
% \end{document}
\subsection{Greek extended}
\newcommand{\GreekExtended}{\raggedright
ἀ ἁ ἂ ἃ ἄ ἅ ἆ ἇ Ἀ Ἁ Ἂ Ἃ Ἄ Ἅ Ἆ Ἇ \\
\>\textalpha{}
\<\textalpha{}
\`>\textalpha{}
\<`\textalpha{}
\>'\textalpha{}
\<'\textalpha{}
\>~\textalpha{}
\<~\textalpha{}
\>\textAlpha{}
\<\textAlpha{}
\>`\textAlpha{}
\<`\textAlpha{}
\>'\textAlpha{}
\<'\textAlpha{}
\~>\textAlpha{}
\~<\textAlpha{} \\
ἐ ἑ ἒ ἓ ἔ ἕ ␣ ␣ Ἐ Ἑ Ἒ Ἓ Ἔ Ἕ \\
\>\textepsilon{}
\<\textepsilon{}
\>`\textepsilon{}
\<`\textepsilon{}
\>'\textepsilon{}
\<'\textepsilon{}
␣ ␣ \>\textEpsilon{}
\<\textEpsilon{}
\>`\textEpsilon{}
\<`\textEpsilon{}
\>'\textEpsilon{}
\<'\textEpsilon{}\\
ἠ ἡ ἢ ἣ ἤ ἥ ἦ ἧ Ἠ Ἡ Ἢ Ἣ Ἤ Ἥ Ἦ Ἧ \\
\>\texteta{}
\<\texteta{}
\>`\texteta{}
\<`\texteta{}
\>'\texteta{}
\<'\texteta{}
\~>\texteta{}
\~<\texteta{}
\>\textEta{}
\<\textEta{}
\>`\textEta{}
\<`\textEta{}
\'>\textEta{}
\<'\textEta{}
\~>\textEta{}
\~<\textEta{} \\
ἰ ἱ ἲ ἳ ἴ ἵ ἶ ἷ Ἰ Ἱ Ἲ Ἳ Ἴ Ἵ Ἶ Ἷ \\
\>\textiota{}
\<\textiota{}
\>`\textiota{}
\<`\textiota{}
\>'\textiota{}
\<'\textiota{}
\~>\textiota{}
\~<\textiota{}
\>\textIota{}
\<\textIota{}
\>`\textIota{}
\<`\textIota{}
\>'\textIota{}
\<'\textIota{}
\~>\textIota{}
\~<\textIota{} \\
ὀ ὁ ὂ ὃ ὄ ὅ ␣ ␣ Ὀ Ὁ Ὂ Ὃ Ὄ Ὅ \\
\>\textomicron{}
\<\textomicron{}
\>`\textomicron{}
\<`\textomicron{}
\>'\textomicron{}
\<'\textomicron{}
␣ ␣ \>\textOmicron{}
\<\textOmicron{}
\>`\textOmicron{}
\<`\textOmicron{}
\>'\textOmicron{}
\<'\textOmicron{} \\
ὐ ὑ ὒ ὓ ὔ ὕ ὖ ὗ ␣ Ὑ ␣ Ὓ ␣ Ὕ ␣ Ὗ \\
\>\textupsilon{}
\<\textupsilon{}
\>`\textupsilon{}
\<`\textupsilon{}
\>'\textupsilon{}
\<'\textupsilon{}
\~>\textupsilon{}
\~<\textupsilon{}
␣ \<\textUpsilon{}
␣ \<`\textUpsilon{}
␣ \<'\textUpsilon{}
␣ \~<\textUpsilon{} \\
ὠ ὡ ὢ ὣ ὤ ὥ ὦ ὧ Ὠ Ὡ Ὢ Ὣ Ὤ Ὥ Ὦ Ὧ \\
\>\textomega{}
\<\textomega{}
\>`\textomega{}
\<`\textomega{}
\>'\textomega{}
\<'\textomega{}
\~>\textomega{}
\~<\textomega{}
\>\textOmega{}
\<\textOmega{}
\>`\textOmega{}
\<`\textOmega{}
\>'\textOmega{}
\<'\textOmega{}
\~>\textOmega{}
\~<\textOmega{} \\
ὰ ά ὲ έ ὴ ή ὶ ί ὸ ό ὺ ύ ὼ ώ \\
\`\textalpha{}
\'\textalpha{}
\`\textepsilon{}
\'\textepsilon{}
\`\texteta{}
\'\texteta{}
\`\textiota{}
\'\textiota{}
\`\textomicron{}
\'\textomicron{}
\`\textupsilon{}
\'\textupsilon{}
\`\textomega{}
\'\textomega{} \\
ᾀ ᾁ ᾂ ᾃ ᾄ ᾅ ᾆ ᾇ ᾈ ᾉ ᾊ ᾋ ᾌ ᾍ ᾎ ᾏ \\
\>\textalpha\ypogegrammeni{}
\<\textalpha\ypogegrammeni{}
\>`\textalpha\ypogegrammeni{}
\<`\textalpha\ypogegrammeni{}
\>'\textalpha\ypogegrammeni{}
\<'\textalpha\ypogegrammeni{}
\~>\textalpha\ypogegrammeni{}
\~<\textalpha\ypogegrammeni{}
\>\textAlpha\ypogegrammeni{}
\<\textAlpha\ypogegrammeni{}
\>`\textAlpha\ypogegrammeni{}
\<`\textAlpha\ypogegrammeni{}
\>'\textAlpha\ypogegrammeni{}
\<'\textAlpha\ypogegrammeni{}
\~>\textAlpha\ypogegrammeni{}
\~<\textAlpha\ypogegrammeni{} \\
ᾐ ᾑ ᾒ ᾓ ᾔ ᾕ ᾖ ᾗ ᾘ ᾙ ᾚ ᾛ ᾜ ᾝ ᾞ ᾟ \\
\>\texteta\ypogegrammeni{}
\<\texteta\ypogegrammeni{}
\>`\texteta\ypogegrammeni{}
\<`\texteta\ypogegrammeni{}
\>'\texteta\ypogegrammeni{}
\<'\texteta\ypogegrammeni{}
\~>\texteta\ypogegrammeni{}
\~<\texteta\ypogegrammeni{}
\>\textEta\ypogegrammeni{}
\<\textEta\ypogegrammeni{}
\>`\textEta\ypogegrammeni{}
\<`\textEta\ypogegrammeni{}
\>'\textEta\ypogegrammeni{}
\<'\textEta\ypogegrammeni{}
\>~\textEta\ypogegrammeni{}
\<~\textEta\ypogegrammeni{} \\
ᾠ ᾡ ᾢ ᾣ ᾤ ᾦ ᾧ ᾥ ᾨ ᾩ ᾪ ᾫ ᾬ ᾭ ᾮ ᾯ \\
\>\textomega\ypogegrammeni{}
\<\textomega\ypogegrammeni{}
\>`\textomega\ypogegrammeni{}
\<`\textomega\ypogegrammeni{}
\>'\textomega\ypogegrammeni{}
\<'\textomega\ypogegrammeni{}
\~>\textomega\ypogegrammeni{}
\~<\textomega\ypogegrammeni{}
\>\textOmega\ypogegrammeni{}
\<\textOmega\ypogegrammeni{}
\>`\textOmega\ypogegrammeni{}
\<`\textOmega\ypogegrammeni{}
\>'\textOmega\ypogegrammeni{}
\<'\textOmega\ypogegrammeni{}
\~>\textOmega\ypogegrammeni{}
\~<\textOmega\ypogegrammeni{} \\
ᾰ ᾱ ᾲ ᾳ ᾴ ␣ ᾶ ᾷ Ᾰ Ᾱ Ὰ Ά ᾼ ᾽ ι ᾿ \\
\u\textalpha{}
\=\textalpha{}
\`\textalpha\ypogegrammeni{}
\textalpha\ypogegrammeni{}
\'\textalpha\ypogegrammeni{}
␣ \~\textalpha{}
\~\textalpha\ypogegrammeni{}
\u\textAlpha{}
\=\textAlpha{}
\`\textAlpha{}
\'\textAlpha{}
\textAlpha\ypogegrammeni{}
\>{}
\prosgegrammeni{}
\>{} \\
῀ ῁ ῂ ῃ ῄ ␣ ῆ ῇ Ὲ Έ Ὴ Ή ῌ ῍ ῎ ῏ \\
\~{}
\"\~{}
\`\texteta\ypogegrammeni{}
\texteta\ypogegrammeni{}
\'\texteta\ypogegrammeni{}
␣ \~\texteta{}
\~\texteta\ypogegrammeni{}
\`\textEpsilon{}
\'\textEpsilon{}
\`\textEta{}
\'\textEta{}
\textEta\ypogegrammeni{}
\>`{}
\>'{}
\~>{} \\
ῐ ῑ ῒ ΐ ␣ ␣ ῖ ῗ Ῐ Ῑ Ὶ Ί ␣ ῝ ῞ ῟ \\
\u\textiota{}
\=\textiota{}
\`"\textiota{}
\'"\textiota{}
␣ ␣ \~\textiota{}
\~"\textiota{}
\u\textIota{}
\=\textIota{}
\`\textIota{}
\'\textIota{}
␣
\<`{}
\<'{}
\~<{} \\
ῠ ῡ ῢ ΰ ῤ ῥ ῦ ῧ Ῠ Ῡ Ὺ Ύ Ῥ ῭ ΅ ` \\
\u\textupsilon{}
\=\textupsilon{}
\`"\textupsilon{}
\'"\textupsilon{}
\>\textrho{}
\<\textrho{}
\~\textupsilon{}
\~"\textupsilon{}
\u\textUpsilon{}
\=\textUpsilon{}
\`\textUpsilon{}
\'\textUpsilon{}
\<\textRho{}
\`"{}
\'"{}
\`{} \\
␣ ␣ ῲ ῳ ῴ ␣ ῶ ῷ Ὸ Ό Ὼ Ώ ῼ ´ ῾ ␣ \\
␣ ␣ \`\textomega\ypogegrammeni{}
\textomega\ypogegrammeni{}
\'\textomega\ypogegrammeni{}
␣ \~\textomega{}
\~\textomega\ypogegrammeni{}
\`\textOmicron{}
\'\textOmicron{}
\`\textOmega{}
\'\textOmega{}
\textOmega\ypogegrammeni{}
\'{}
\<{} ␣
}
No case change:
\begin{quote}
\selectlanguage{greek}
\GreekExtended
\end{quote}
%
MakeUppercase:
\begin{quote}
\selectlanguage{greek}
\MakeUppercase{\GreekExtended}
\end{quote}
%
MakeLowercase:
\begin{quote}
\selectlanguage{greek}
\MakeLowercase{\GreekExtended}
\end{quote}
\subsection{Hiatus}
Tonos and psili mark a \emph{hiatus} (break-up of a diphthong) if
placed on the first vowel of a diphthong.
A dialytika must be placed on the second vowel if they are dropped, e.g.
%
\newcommand{\HiatusNamed}{\acctonos\textalpha\textiota,
\acctonos\textalpha\textupsilon,
\accpsilioxia\textalpha\textiota,
\accpsili\accoxia\textalpha\textupsilon,
\accpsili\textalpha\textupsilon,
\acctonos\textepsilon\textiota,
\accoxia\textepsilon\textiota}%
\ensuregreek{\HiatusNamed\ $\mapsto$ \MakeUppercase{\HiatusNamed}}.
Some affected words:
\begin{quotation}
\selectlanguage{greek}
\newcommand*{\aylos}{% from teubner.sty: άυλος → ΑΫΛΟΣ
\acctonos\textalpha\textupsilon\textlambda\textomicron\textfinalsigma}
\aylos{} $\mapsto$ \MakeUppercase{\aylos},
\renewcommand*{\aylos}{% polytonic: ἄυλος → ΑΫΛΟΣ
\'>\textalpha\textupsilon\textlambda\textomicron\textfinalsigma}
\aylos{} $\mapsto$ \MakeUppercase{\aylos},
% https://lsj.gr/wiki/ἀυπνία
\newcommand*{\ahypnia}{% ἀυπνία → ΑΫΠΝΙΑ
\accpsili\textalpha\textupsilon\textpi\textnu\acctonos\textiota\textalpha}
\ahypnia{} $\mapsto$ \MakeUppercase{\ahypnia},
% from http://diacritics.typo.cz/index.php?id=69
\newcommand*{\maina}{%μάινα → ΜΑΪΝΑ
\textmu\acctonos\textalpha\textiota\textnu\textalpha}
\maina{} $\mapsto$ \MakeUppercase{\maina},
% from http://de.wikipedia.org/wiki/Neugriechische_Orthographie#Das_Trema
\newcommand*{\keik}{% κέικ → ΚΕΪΚ
\textkappa\acctonos\textepsilon\textiota\textkappa}
\keik{} $\mapsto$ \MakeUppercase{\keik},
% from http://multilingualtypesetting.co.uk/blog/greek-typesetting-tips/
\newcommand*{\romeika}{\textrho\textomega\textmu
\acctonos\textepsilon\textiota\textkappa\textalpha}
\romeika{} $\mapsto$ \MakeUppercase{\romeika}.
\end{quotation}
With the pre-2022/06 \verb|\MakeUppercase|, automatic upcasing of words with
\emph{hiatus} works correctly only if the accents are input as macro and the
letters as macro or via the Latin transliteration.
Hiatus examples with short accent macros and LICR:
\newcommand{\HiatusShort}{\'\textalpha\textiota,
\'\textalpha\textupsilon,
\>'\textalpha\textupsilon,
\'>\textalpha\textupsilon,
\>\textalpha\textupsilon,
\'\textepsilon\textiota,
\>\textalpha\textupsilon,
\>'\textepsilon\textiota,
\'>\textepsilon\textiota
}%
\ensuregreek{\HiatusShort\ $\mapsto$ \MakeUppercase{\HiatusShort}}.
\section{short accent macros + literal character}
This section compares literal Unicode Greek characters to characters input
using accent macros and the literal base character.
\ifdefined \UnicodeEncodingName
\else
\begin{quote} \em
Skipped, as accent macros on a Greek literal Unicode character lead
to errors.
\end{quote}
\end{document}
\fi
\subsection{Greek and Coptic}
\renewcommand{\GreekAndCoptic}{% only characters supported by LGR
\raggedright
␣ ␣ ␣ ␣ ΄ ΅ Ά · Έ Ή Ί ␣ Ό ␣ Ύ Ώ \\
␣ ␣ ␣ ␣ ΄ \"'{ } \'Α · \'Ε \'Η \'Ι ␣ \'Ο ␣ \'Υ \'Ω \\
ΐ Α Β Γ Δ Ε Ζ Η Θ Ι Κ Λ Μ Ν Ξ Ο \\
\'"ι Α Β Γ Δ Ε Ζ Η Θ Ι Κ Λ Μ Ν Ξ Ο \\
Π Ρ ␣ Σ Τ Υ Φ Χ Ψ Ω Ϊ Ϋ ά έ ή ί \\
Π Ρ ␣ Σ Τ Υ Φ Χ Ψ Ω \"Ι \"Υ \'α \'ε \'η \'ι \\
ΰ α β γ δ ε ζ η θ ι κ λ μ ν ξ ο \\
\"'υ α β γ δ ε ζ η θ ι κ λ μ ν ξ ο \\
π ρ ς σ τ υ φ χ ψ ω ϊ ϋ ό ύ ώ ␣ \\
π ρ ς σ τ υ φ χ ψ ω \"ι \"υ \'ο \'υ \'ω ␣ \\
}
No case change:
\begin{quote}
\selectlanguage{greek}
\GreekAndCoptic
\end{quote}
%
MakeUppercase:
\begin{quote}
\selectlanguage{greek}
\MakeUppercase{\GreekAndCoptic}
\end{quote}
%
MakeLowercase:
\begin{quote}
\selectlanguage{greek}
\MakeLowercase{\GreekAndCoptic}
\end{quote}
\subsection{Greek extended}
\renewcommand{\GreekExtended}{\raggedright
ἀ ἁ ἂ ἃ ἄ ἅ ἆ ἇ Ἀ Ἁ Ἂ Ἃ Ἄ Ἅ Ἆ Ἇ \\
\>α \<α \`>α \<`α \>'α \<'α \~>α \~<α
\>Α \<Α \>`Α \<`Α \>'Α \<'Α \~>Α \~<Α \\
ἐ ἑ ἒ ἓ ἔ ἕ ␣ ␣ Ἐ Ἑ Ἒ Ἓ Ἔ Ἕ \\
\>ε \<ε \>`ε \<`ε \>'ε \<'ε ␣ ␣
\>Ε \<Ε \>`Ε \<`Ε \>'Ε \<'Ε\\
ἠ ἡ ἢ ἣ ἤ ἥ ἦ ἧ Ἠ Ἡ Ἢ Ἣ Ἤ Ἥ Ἦ Ἧ \\
\>η \<η \>`η \<`η \>'η \<'η \~>η \~<η
\>Η \<Η \>`Η \<`Η \'>Η \<'Η \~>Η \~<Η \\
ἰ ἱ ἲ ἳ ἴ ἵ ἶ ἷ Ἰ Ἱ Ἲ Ἳ Ἴ Ἵ Ἶ Ἷ \\
\>ι \<ι \>`ι \<`ι \>'ι \<'ι \~>ι \~<ι
\>Ι \<Ι \>`Ι \<`Ι \>'Ι \<'Ι \~>Ι \~<Ι \\
ὀ ὁ ὂ ὃ ὄ ὅ ␣ ␣ Ὀ Ὁ Ὂ Ὃ Ὄ Ὅ \\
\>ο \<ο \>`ο \<`ο \>'ο \<'ο ␣ ␣ \>Ο \<Ο \>`Ο \<`Ο \>'Ο \<'Ο \\
ὐ ὑ ὒ ὓ ὔ ὕ ὖ ὗ ␣ Ὑ ␣ Ὓ ␣ Ὕ ␣ Ὗ \\
\>υ \<υ \>`υ \<`υ \>'υ \<'υ \~>υ \~<υ ␣ \<Υ ␣ \<`Υ ␣ \<'Υ ␣ \~<Υ \\
ὠ ὡ ὢ ὣ ὤ ὥ ὦ ὧ Ὠ Ὡ Ὢ Ὣ Ὤ Ὥ Ὦ Ὧ \\
\>ω \<ω \>`ω \<`ω \>'ω \<'ω \~>ω \~<ω
\>Ω \<Ω \>`Ω \<`Ω \>'Ω \<'Ω \~>Ω \~<Ω \\
ὰ ά ὲ έ ὴ ή ὶ ί ὸ ό ὺ ύ ὼ ώ \\
\`α \'α \`ε \'ε \`η \'η \`ι \'ι \`ο \'ο \`υ \'υ \`ω \'ω \\
ᾀ ᾁ ᾂ ᾃ ᾄ ᾅ ᾆ ᾇ ᾈ ᾉ ᾊ ᾋ ᾌ ᾍ ᾎ ᾏ \\
\>α\ypogegrammeni{}
\<α\ypogegrammeni{}
\>`α\ypogegrammeni{}
\<`α\ypogegrammeni{}
\>'α\ypogegrammeni{}
\<'α\ypogegrammeni{}
\~>α\ypogegrammeni{}
\~<α\ypogegrammeni{}
\>Α\ypogegrammeni{}
\<Α\ypogegrammeni{}
\>`Α\ypogegrammeni{}
\<`Α\ypogegrammeni{}
\>'Α\ypogegrammeni{}
\<'Α\ypogegrammeni{}
\~>Α\ypogegrammeni{}
\~<Α\ypogegrammeni{} \\
ᾐ ᾑ ᾒ ᾓ ᾔ ᾕ ᾖ ᾗ ᾘ ᾙ ᾚ ᾛ ᾜ ᾝ ᾞ ᾟ \\
\>η\ypogegrammeni{}
\<η\ypogegrammeni{}
\>`η\ypogegrammeni{}
\<`η\ypogegrammeni{}
\>'η\ypogegrammeni{}
\<'η\ypogegrammeni{}
\~>η\ypogegrammeni{}
\~<η\ypogegrammeni{}
\>η\ypogegrammeni{}
\<η\ypogegrammeni{}
\>`η\ypogegrammeni{}
\<`η\ypogegrammeni{}
\>'η\ypogegrammeni{}
\<'η\ypogegrammeni{}
\~>η\ypogegrammeni{}
\~<η\ypogegrammeni{} \\
ᾠ ᾡ ᾢ ᾣ ᾤ ᾦ ᾧ ᾥ ᾨ ᾩ ᾪ ᾫ ᾬ ᾭ ᾮ ᾯ \\
\>ω\ypogegrammeni{}
\<ω\ypogegrammeni{}
\>`ω\ypogegrammeni{}
\<`ω\ypogegrammeni{}
\>'ω\ypogegrammeni{}
\<'ω\ypogegrammeni{}
\~>ω\ypogegrammeni{}
\~<ω\ypogegrammeni{}
\>ω\ypogegrammeni{}
\<ω\ypogegrammeni{}
\>`ω\ypogegrammeni{}
\<`ω\ypogegrammeni{}
\>'ω\ypogegrammeni{}
\<'ω\ypogegrammeni{}
\~>ω\ypogegrammeni{}
\~<ω\ypogegrammeni{} \\
ᾰ ᾱ ᾲ ᾳ ᾴ ␣ ᾶ ᾷ Ᾰ Ᾱ Ὰ Ά ᾼ ᾽ ι ᾿ \\
\u{α}
\=α
\`α\ypogegrammeni{}
α\ypogegrammeni{}
\'α\ypogegrammeni{}
␣ \~α
\~α\ypogegrammeni{}
\u{Α}
\=Α
\`Α
\'Α
Α\ypogegrammeni{}
\>{}
\prosgegrammeni{}
\>{} \\
῀ ῁ ῂ ῃ ῄ ␣ ῆ ῇ Ὲ Έ Ὴ Ή ῌ ῍ ῎ ῏ \\
\~{}
\"\~{}
\`η\ypogegrammeni{}
η\ypogegrammeni{}
\'η\ypogegrammeni{}
␣ \~η
\~η\ypogegrammeni{}
\`Ε
\'Ε
\`Η
\'Η
η\ypogegrammeni{}
\>`{}
\>'{}
\~>{} \\
ῐ ῑ ῒ ΐ ␣ ␣ ῖ ῗ Ῐ Ῑ Ὶ Ί ␣ ῝ ῞ ῟ \\
\u{ι} \=ι \`"ι \'"ι ␣ ␣ \~ι \~"ι \u{Ι} \=Ι \`Ι \'Ι ␣ \<`{} \<'{} \~<{} \\
ῠ ῡ ῢ ΰ ῤ ῥ ῦ ῧ Ῠ Ῡ Ὺ Ύ Ῥ ῭ ΅ ` \\
\u{υ} \=υ \`"υ \'"υ \>ρ \<ρ \~υ \~"υ
\u{Υ} \=Υ \`Υ \'Υ \<Ρ \`"{} \'"{} \`{} \\
␣ ␣ ῲ ῳ ῴ ␣ ῶ ῷ Ὸ Ό Ὼ Ώ ῼ ´ ῾ ␣ \\
␣ ␣ \`ω\ypogegrammeni{}
ω\ypogegrammeni{}
\'ω\ypogegrammeni{}
␣ \~ω
\~ω\ypogegrammeni{}
\`Ο \'Ο \`Ω \'Ω
ω\ypogegrammeni{}
\'{} \<{} ␣
}
No case change:
\begin{quote}
\selectlanguage{greek}
\GreekExtended
\end{quote}
%
MakeUppercase:
\begin{quote}
\selectlanguage{greek}
\MakeUppercase{\GreekExtended}
\end{quote}
%
MakeLowercase:
\begin{quote}
\selectlanguage{greek}
\MakeLowercase{\GreekExtended}
\end{quote}
Hiatus examples with short accent macros and literal base character:
\renewcommand{\HiatusShort}{\'αι, \'αυ, \>'αυ, \'>αυ, \>αυ, \'ει, \>αυ,
\>'ει, \'>ει}%
\ensuregreek{\HiatusShort\ $\mapsto$ \MakeUppercase{\HiatusShort}}.
\end{document}
The implementation could be made simpler, more similar to the handling of literal characters, and safer if we had a framework to map a function (similar to \DeclareCaseChangeEquivalent
) that is locale sensitive and distinguishs uppercase, titlecase, and lowercase (similar to \DeclareUppercaseMapping
etc).
Then, I could, e.g., write
\DeclareUppercaseEquivalent[el]{\'}{\accACUTE}
\DeclareUppercaseEquivalent[el]{\`}{\accGRAVE}
\DeclareUppercaseEquivalent[el]{\~}{\accTILDE}
\DeclareUppercaseEquivalent[el]{\>}{\LGR@hiatus}
instead of adding to the \@uclclist
.
The main advantage is that document parts that are not Greek will not be affected which lowers the danger of unwanted side-effects.
@gmilde I have an idea that might be less 'heavy' and that uses \CaseSwtich
: I'll need to test it out and will report back
From the description in usrguide.pdf, I got the impression that
\CaseSwitch
is a user command to be used inside \MakeUppercase
.
Now I see that I could replace the hypothetical
\DeclareUppercaseEquivalent[el]{\'}{\accACUTE}
with
\DeclareCaseChangeEquivalent{\'}{%
\CaseSwitch{\'}{\accACUTE}{\'}{\'}
}
This would replace the \@uclclist
extension but still not be locale-specific.
If one of \DeclareCaseChangeEquivalent
or \CaseSwitch
would grow an
optional "locale" argument, this combination would become an alternative on
par with my suggestion of four new configuration commands.
+1 less change, no new (rarely used) commands -1 a bit more verbose in usage
I implemented and tested a comprehensive fix for case changing Greek input via LICR macros. See https://codeberg.org/milde/greek-tex and the test document char-list.tex. It used to work fine with TL21 and TL23 (before the latest update) and will hopefully work again after the fix for https://github.com/latex3/latex3/issues/1236. Feedback is welcome.
@gmilde You are the expert here: if it works, then probably I won't make further changes at the expl3
end as this is essentially about 'legacy' input
The releases of babel-greek 1.14 and greek-fontenc 2.5 implement and test fixes for MakeUppercase with "Greek" diacritics for the LGR, TU, and PU font encodings.
Open issues:
`\MakeLowercase{Σ} correctly downcases to a final sigma (ς) if the Σ is at the end of a word. In LGR fonts, this is handled by an "autsigma" character with ligature definition but in Unicode fonts currently a "normal" σ is printed.
\textSigma
macro? (i.e. a function \textautosigma with Unicode fonts). For disambiguation, the Greek word or (ή / ἢ) keeps diacritics in UPPERCASE. The
2022 MakeUppercase handles this for literal input. It seems there is a test for whitespace on both sides of the eta (diacritics are dropped in, e.g., \MakeUppercase{ή, Ή. ἢ; Ἢ}
if used in a Greek text part.
Is this a correct test, are there corner cases/false positives?
The polytonic variant ETA WITH DASIA AND OXIA used in ἢ … ἤ (either … or) drops diacritics! By mistake, omission, or intent?
Is there a way to apply the same test to input via LICRs (e.g. a function that can be bound to a TextCompositeCommand for \'\texteta
)?
- The polytonic variant ETA WITH DASIA AND OXIA used in ἢ … ἤ (either … or) drops diacritics! By mistake, omission, or intent?
Based on https://icu.unicode.org/design/case/greek-upper, this is by-design; the data there shows
νομικοῦ ἢ διεθνοῦς →
ΝΟΜΙΚΟΥ Ή ΔΙΕΘΝΟΥΣ
so we check for both U+03AE and U+1F22 (and for U+1F2A), and always output U+0389 (Ή) for the isolated letter. IF that's a misinterpretation of the rule, could you provide a link to a demo - I really only had that ICU set to go with.
It seems there is a test for whitespace on both sides of the eta (diacritics are dropped in, e.g.,
\MakeUppercase{ή, Ή. ἢ; Ἢ}
if used in a Greek text part.* Is this a correct test, are there corner cases/false positives?
The current implementation here uses a very simple-minded way to detect word boundaries. At the start of the text, and after every space (charcode 32), there is a 'boundary check' function. For the eta test, the approach is to check if
I've looked at the full Unicode word boundary algorithm: it's complex. What would be a lot easier would be to consider. the Unicode class of any following tokens: that would be able to deal with ἢ;
, for example
@gmilde Are we OK to close here and open new issues as required for what feel like independent ideas?
Remaining issue from https://github.com/latex3/latex2e/issues/987.
With LICR input:
The "usrguide" states:
However, LICR input and literal input are handled differently regarding the Greek uppercase rules.
After loading babel-greek, e.g. the LICR
\'\textalpha
is converted to character ά (03AC GREEK SMALL LETTER ALPHA WITH TONOS).However, with xelatex or lualatex, the minimal example
results in
It seems as if the LICRs for GREEK SMALL LETTER ALPHA WITH TONOS are only partially expanded and the correct upcasing of
\acctonos\textalpha
is due togreek-fontenc.def
which extends the\@uclclist
with the mapping\acctonos\LGR@hiatus
(the latter prints its argument without diacritic and adds a dialytika on the second-next vowel if required for disambiguation).However, a
\@uclclist
mapping of the standard accent macro\'
would also affect Latin and Cyrillic characters. This is OK with 8-bit TeX, where LGR maps Latin to Greek anyway but not for Unicode fonts (TU).I experimented with a mapping
\'\accACUTE
, a default to keep the accent, and Composite definitions dropping it, but did not manage to solve the problem.I wonder whether there is more detailled documentation on the working of the case-changing code. For configuration, I could imagine
\'
),\DeclareUppercaseMapping
that works on a macro instead of a character, or\DeclareTextCompositeCommand
).