reutenauer / polyglossia

An alternative to Babel for XeLaTeX and LuaLaTeX
http://www.ctan.org/pkg/polyglossia
MIT License
190 stars 51 forks source link

Inconsistent syntax #610

Closed Udi-Fogiel closed 2 months ago

Udi-Fogiel commented 10 months ago

Currently the source contains several methods for various tasks, e.g. for defining commands there are TeX primitives (\def, \let etc.), LaTeX2e commands (e.g. \newcommand), LaTeX3 commands (e.g. \NewDocumentCommand), expl3 functions (e.g. \cs_new:n) and commands from the makecmds package (e.g. \provideenvironment). I think the source would be much more readable if it was more consistent, with well structured syntax rules.

There are several aspects to consider. The first, is what methods to use. For example, the following essentially do the same https://github.com/reutenauer/polyglossia/blob/3737629d973c1b197b9c0342df6877555c4a3954/tex/polyglossia.sty#L2771-L2775 https://github.com/reutenauer/polyglossia/blob/3737629d973c1b197b9c0342df6877555c4a3954/tex/polyglossia.sty#L1436-L1437 but in a first glance they look rather different.

Another aspect is macros (or functions) names. The source have two prefixes for internal commands, xpg and polyglossia. I'm not sure what is the difference, nor what is the meaning of xpg. It does not seem like the naming conventions specified in interface3 (section 1.1) are respected (using @ for function names, using \DeclareDocumentCommand for private functions) and scratch variable syntax can be improved.

IMO the best approach would be to rewrite everything in expl3 and LaTeX3 syntax, and to use l3keys2e only (not xkeyval). This is because most of the code is already dependent on it, and it offers more functionality than the other options. But, the most important thing is that the person who deals with the code most will understand it.

It probably should be done is several stages, while testing after each stage.

Udi-Fogiel commented 9 months ago

that these inherit the initial keys?

If this is the case, I did not understand correctly, sorry... I do agree that it is ok to keep it this way, and not considering as a bug.

Now that everything is clear I will resume the work on the transition to to l3keys, I'll probably just fix the bug while doing it.

Udi-Fogiel commented 9 months ago

(see for instance the otherlanguage environment, where we restore the pre-switch options after the environment has been closed)

I'm wondering why this is needed, as \selectlanguage should handle the restoring of the settings with \aftergroup.

Udi-Fogiel commented 9 months ago

Ok, this is because the only thing that happens at the end of the group with \selectlanguage is basically this: https://github.com/reutenauer/polyglossia/blob/fb7d43c38bc713bae410b7076fc1afb50af5853f/tex/polyglossia.sty#L1904-L1907 And it seems like something is not restoring correctly there. It also means we write to the aux file twice sometimes...

Udi-Fogiel commented 9 months ago

Actually, it seems like nested \selectlanguae are ok,

\documentclass{article}
\usepackage{polyglossia}
\setmainlanguage[numerals=arabic]{hebrew}
\setmainfont{David CLM}
\protected\def\foo#1{}
\begin{document}
\makeatletter
\def\bar#1{\write\@auxout{\foo{#1}}}
\bar{arabic}
{
\selectlanguage[numerals=hebrew]{hebrew}
\bar{hebrew}
{
\selectlanguage[numerals=arabic]{hebrew}
\bar{arabic}
}\bar{hebrew}}
\bar{arabic}
a
\end{document}

Here is the aux

\relax 
\ProvideDocumentCommand \selectlanguage {sO{}m}{}
\@writefile{toc}{\selectlanguage *[numerals=arabic]{hebrew}}
\foo {arabic}
\@writefile{toc}{\selectlanguage *[numerals=hebrew]{hebrew}}
\foo {hebrew}
\@writefile{toc}{\selectlanguage *[numerals=arabic]{hebrew}}
\foo {arabic}
\@writefile{toc}{\selectlanguage *[numerals=hebrew]{hebrew}}
\foo {hebrew}
\@writefile{toc}{\selectlanguage *[numerals=arabic]{hebrew}}
\foo {arabic}
\gdef \@abspage@last{1}

maybe the fix at 1e6f728 related to nested languages is not really needed, and the source of it is the bug with the default values, and it fixed it seems to fix it because it consider default values?

jspitz commented 9 months ago

Please go ahead with l3keys transition. We can address bugs with key restoration later. If the language stacking turns out to be not needed, and grouping works, the better (I didn't manage when I tried back then).

To make my point clear: I think that any \selectlanguage should start off from the initial key settings (as done in \setmainlanguage or \setotherlanguage). We just need to cache those, as we do with variable and babel name. But this doesn't need to be fixed now.

jspitz commented 9 months ago

I'll go on halt with my own rewrite for the time being in order to not step on your toes

jspitz commented 5 months ago

Apart from keyval, we are now basically done. We have used the LyX testcases to check nothing breaks at least in the construct the LyX test files use and the polyglossia languages LyX supports. We have not checked for consistency and issues in the output that do not error or warn.

The polyglossia style file and the glosses that are supported by LyX do not use etoolbox any longer. We'd need to check the remaining newer languages and then we can remove the call of the package.

Of course, any polishing and testing is highly welcome.

jspitz commented 5 months ago

The polyglossia style file and the glosses that are supported by LyX do not use etoolbox any longer. We'd need to check the remaining newer languages and then we can remove the call of the package.

Update on that: a number of gloss files at least employ \ifcsdef and \ifcsunded. For these, we could provide LaTeX2e aliases for the respective l3 checks.

Udi-Fogiel commented 5 months ago

The polyglossia style file and the glosses that are supported by LyX do not use etoolbox any longer. We'd need to check the remaining newer languages and then we can remove the call of the package.

That's great! thank you for working on that, and sorry for disappearing in the last several months.

Update on that: a number of gloss files at least employ \ifcsdef and \ifcsunded. For these, we could provide LaTeX2e aliases for the respective l3 checks.

Well. the LaTeX kernel provides \@ifundefined which is equivalent to \ifcsundef, so do we really need to provide another equivalent? Note that both \ifcsdef and \ifcsundef returns true if the command the equivalent to \relax, I don't know if it matters in the gloss files, from a quick glance it isn't, but replacing \ifcsdef with \@ifundefined should be done carefully.

Udi-Fogiel commented 5 months ago

Apart from keyval, we are now basically done.

Hopefully I will have time to finish the work on keyval in the next few days.

jspitz commented 5 months ago

Well. the LaTeX kernel provides \@ifundefined which is equivalent to \ifcsundef, so do we really need to provide another equivalent?

I remember it didn't work good particularly in the very nested cases. And I would like to avoid too much \if...\fi and \csname...\endcsname. The code is already hard to parse without that.

Udi-Fogiel commented 5 months ago

Well. the LaTeX kernel provides \@ifundefined which is equivalent to \ifcsundef, so do we really need to provide another equivalent?

I remember it didn't work good particularly in the very nested cases. And I would like to avoid too much \if...\fi and \csname...\endcsname. The code is already hard to parse without that.

As I mentioned, \@ifundefined is equivalent to \ifcsundef , i.e. it takes arguments, doesn't use \if ... \fi, it is ok for nesting.

When etoolbox as written, LaTeX did not assume yet that the user has e-TeX, hence the definition of \@ifundefined was different back then.

Udi-Fogiel commented 5 months ago

Some glosses also have \providebool, ifbool etc.

jspitz commented 5 months ago

Some glosses also have \providebool, ifbool etc.

The \providebool cases have been addressed. But there is more complex stuff (such as \patchcmd) also in the glosses that tinker with the headers. I wonder whether we should introduce \RequireEToolBox and let those glosses use that.

Udi-Fogiel commented 5 months ago

That could potentially be a problem if someone is loading a language after the beginning of the document (#614), there the glosses will not be able to load any package.

jspitz commented 5 months ago

Right, but we'd have the same problem with \RequireBidi. So probably the answer to your question in #614 should be: no, languages must not be loaded in the document body.

Udi-Fogiel commented 5 months ago

Yes, but \RequireBidi always had this behavior, while introducing \RequireEToolBox can poteontially affect existing documents. Don't get me wrong, I'm in favor of limiting the loading time to the preamble, it will make our life much easier, and I don't see any disadvantage in that, apart from the possible errors that users can get (and can easily fix).

jspitz commented 5 months ago

Let's see. Getting rid of etoolbox is not the most important thing. I'd just be nice to have a package less in dependencies.

jspitz commented 5 months ago

I have removed the easy etoolbox remainders at 787195f98a766fbe5.

We only have etoolbox commands now in the glosses that patch the headings and therefore employ \patchcmd, namely French, Hungarian, Korean, and Uyghur.

Udi-Fogiel commented 5 months ago

I think I also saw \ifstreq in the Chinese gloss keys definition, I'm currently working on removing xkeyval so I removed that on my local machine, but there might be more instances we missed.

I guess the easiest way to check that would be to remove etoolbox from the core package and see which tests fail (note that currently the tests don't check errors at loading time, I changed that on my local machine as well).

I have 30 glosses more to remove xkeyval from, @jspitz do you prefer I'll push what I did up until now or only when I'll finish?

Udi-Fogiel commented 5 months ago

Actually, would you mind reverting the last two commits, there are conflicts I'm having trouble with.

Udi-Fogiel commented 5 months ago

Actually, would you mind reverting the last two commits, there are conflicts I'm having trouble with.

never mind, I managed to resolve them.

Udi-Fogiel commented 5 months ago

The following are the remaining glosses that need xkeyval to be removed from

tex/gloss-latin.ldf
tex/gloss-german.ldf
tex/gloss-arabic.ldf
tex/gloss-persian.ldf
tex/gloss-armenian.ldf
tex/gloss-korean.ldf
tex/gloss-punjabi.ldf
tex/gloss-welsh.ldf
tex/gloss-odia.ldf
tex/gloss-catalan.ldf
tex/gloss-hebrew.ldf
tex/gloss-malay.ldf
tex/gloss-serbian.ldf
tex/gloss-uyghur.ldf
tex/gloss-english.ldf
tex/gloss-norwegian.ldf
tex/gloss-portuguese.ldf
tex/gloss-italian.ldf
tex/gloss-lao.ldf
tex/gloss-french.ldf
tex/gloss-slovenian.ldf
tex/gloss-syriac.ldf
tex/gloss-ukrainian.ldf
tex/gloss-gaelic.ldf
tex/gloss-thai.ldf
tex/gloss-mongolian.ldf
tex/gloss-dutch.ldf
tex/gloss-russian.ldf
tex/gloss-sanskrit.ldf
tex/gloss-georgian.ldf
jspitz commented 5 months ago

Sorry for the conflict, I won't commit anything now until you are done. I'd prefer to have all in at once to have a functioning package. But you could set up a branch and commit to those subsequently, and merge the branch with master once everything is ready and tested.

Udi-Fogiel commented 5 months ago

There are instances of \docvlist in some glosses as well

Udi-Fogiel commented 5 months ago

I've removed xkeyval from all the glosses. I've noticed that urdu, khmer, kannada, occitan, and piedmontese does not contain \InitializeGlossOptions, is this intentional?

jspitz commented 5 months ago

I've removed xkeyval from all the glosses.

Excellent, thanks.

I've noticed that urdu, khmer, kannada, occitan, and piedmontese does not contain \InitializeGlossOptions, is this intentional?

This should probably be added.

jspitz commented 4 months ago

There are instances of \docvlist in some glosses as well

I have removed these with 24132c6fb549bd

Udi-Fogiel commented 4 months ago

I have removed these with 24132c6

The malay gloss test fails after this commit, the warnings in the log seems strange to me, can you check that?

jspitz commented 4 months ago

Still don't know how to run the test suite, but a simple malay file compiles for me wihout problems.

Udi-Fogiel commented 4 months ago

It compiles, but there there are many warning messages. There are also more differences, I get different hyphenation in the luatex output of test-gloss-dsb.

To run the tests you can use l3build check to run all of them, if there are any fails, there should be .diff files in build/test and build/test-configfiles

Udi-Fogiel commented 4 months ago

To check a specific file from the auto generated file you can use e.g. l3build check -c configfiles/config-autogen test-gloss-dsb or if you want to test only with luatex then l3build check -e luatex -c configfiles/config-autogen test-gloss-dsb

Udi-Fogiel commented 4 months ago

Here is a simple test (compile with luatex):

\documentclass{article}
\usepackage{polyglossia}
\setmainlanguage{dsb}
\begin{document}
\day=6
\month=8
\year=2012
\hsize=1cm
\today.
\end{document}

The output before 24132c6 is image

and after the commit is

image

Udi-Fogiel commented 4 months ago

I'll probably won't be available in the following several hours, if you won't find the problem I'll look deeper later.

jspitz commented 4 months ago

Should be fixed with e0518ea5b1c6ff. Thinko.

Udi-Fogiel commented 4 months ago

Looks good now, thanks.

Just a question, I've noticed you added information to the log, is this desirable? (just checking before I update the tests for that)

And with test-gloss-dsb I'm seeing both the lines

Package polyglossia Info: Using hyphenation pattern 'uppersorbian' for usorbian
Package polyglossia Info: Using hyphenation pattern 'usorbian' for lsorbian

which to me looks confusing, but I'm not sure if it is appropriate or not.

The log also contains the following (not related to recent commits)

Package polyglossia Info: Option: sorbian, variant=upper
Package polyglossia Info: Option: sorbian, variant=lower

and the latter appears twice, I'm wondering why we need both lines, and why the second one appears twice. Maybe we should clean the information we write to the log.

jspitz commented 4 months ago

I thought it would be information that might be useful, particularly with cases that fall back to variant patterns (such as english -> newzealand). But I will remove it. On the other hand, I think the tests a quite a bit too pedantic here.

Udi-Fogiel commented 4 months ago

I thought it would be information that might be useful, particularly with cases that fall back to variant patterns (such as english -> newzealand).

Then I'll update the tests. Just to be clear, the fact that I'm seeing this message twice with different information in the log is expected?

But I will remove it. On the other hand, I think the tests a quite a bit too pedantic here.

You don't need to remove that I just wanted to make sure that the current output is the expected one.

I agree that the tests are quite pedantic, but I'm not sure how to filter out such changes without also ignoring regressions like the one we had saw yesterday...

jspitz commented 4 months ago

Just to be clear, the fact that I'm seeing this message twice with different information in the log is expected?

No.

You don't need to remove that I just wanted to make sure that the current output is the expected one.

It's fine. I thought about removing it anyway.

jspitz commented 4 months ago

Just to be clear, the fact that I'm seeing this message twice with different information in the log is expected?

No.

Well, in a way, it is. I think the first message is issued with

\InitializeGlossOptions{sorbian}{variant=upper,olddate=false}

when the gloss is loaded. The others I would need to check. This probably is another argument why we should remove these info messages from the gloss and have it centralized in some function (\SetGlossOptions?).

Udi-Fogiel commented 4 months ago

@jspitz I find the code for for tracing \babelname pretty complex, is there a reason why

\DeclareExpandableDocumentCommand \babelname { }
  {
    \prop_item:Ne  \g_xpg_langsetup_prop { \languagename / babelname }
  }

won't suffice?

Udi-Fogiel commented 4 months ago

And what is the use of \<lang>@gbabelname?

Udi-Fogiel commented 4 months ago

The source of the problem described in #334 is the fact that we store things in \g_xpg_langsetup_prop globally. Since the variant key do these assignment there is a global change when TeX reads the .toc file, it changes the babelname entry globally, since the last line in the .toc file is \selectlanguage *[variant=american]{english}, \babelname is wrongly set.

Is there a reason we do all these global assignments for local properties? I think I'm going to make \g_xpg_langsetup_prop local...

Udi-Fogiel commented 4 months ago

I implemented the idea in 11c1fe9. I tested the examples from #334 and #357 and it looks ok, please test when you have time.

Udi-Fogiel commented 2 months ago

@jspitz Is there anything left to do regarding this ticket?

jspitz commented 2 months ago

No, I think we are done.