latex3 / latex2e

The LaTeX2e kernel
https://www.latex-project.org/
LaTeX Project Public License v1.3c
1.94k stars 267 forks source link

Undocumented limitation on builtin generic hook names #1237

Closed jlaurens closed 1 week ago

jlaurens commented 10 months ago

Brief outline of the bug

The variable part of builtin generic hook names cannot contain /. An environment name A/B can't have hooks with actual implementation. This limitation is undocumented.

Minimal example showing the bug

\RequirePackage{latexbug}       % <--should be always 
\documentclass{article}
\begin{document}
\NewDocumentEnvironment{!*!}{}{}{}
\AddToHook{env/!*!/before}{!}
\AddToHook{env/!*!/after}{!}
\begin{!*!}XXX\end{!*!}
\NewDocumentEnvironment{/*/}{}{}{}
\AddToHook{env//*//before}{/}
\AddToHook{env//*//after}{/}
\begin{/*/}XXX\end{/*/}
\end{document}

The environment !*! adds ! before and after its body. Replacing ! by /, the environment /*/ is expected to add / before and after its body.

Log file (required) and possibly PDF file

2.log

FrankMittelbach commented 10 months ago

I'm fairly sure that in several places in the LaTeX documentation environments are documented as restricted to [a-zA-Z]+ most likely in Lamport already. Now we know that this isn't quite true and that you can use further characters with different success rate but that doesn't mean this this is officially supported behavior.

I therefore don't think that generic hooks should attempt to do anything about it nor that on the level of the hook documentation it should be called out especially. Anybody who attempts to use such a generic hook for envs should be able to realize that it is impossible for the software to identify if a / is part of an environment name or indicates a different level in the hook name.

FrankMittelbach commented 10 months ago

Documentation on environments according to Lamport: the "name" of the environment can be any sequence of letters, numbers, and the character * that do not begin with "end".

Letters in this context is defined by [a-zA-Z] but that should also be clear by calling out the only other allowed character: *

github-actions[bot] commented 8 months ago

This issue has been automatically marked as stale because it has not had recent activity.

josephwright commented 7 months ago

Document in ltcmd and lthooks.

Udi-Fogiel commented 3 months ago

I'm fairly sure that in several places in the LaTeX documentation environments are documented as restricted to [a-zA-Z]+ most likely in Lamport already. Now we know that this isn't quite true and that you can use further characters with different success rate but that doesn't mean this this is officially supported behavior.

Hopefully you will agree that this restriction is quite outdated. I don't mind the restriction on special characters, but forcing (or not officially support) users from all over the world to use names only from the english language is quite unfortunate.

u-fischer commented 3 months ago

@Udi-Fogiel

Hopefully you will agree that this restriction is quite outdated. I don't mind the restriction on special characters, but forcing (or not officially support) users from all over the world to use names only from the english language is quite unfortunate.

Well at first the issue here is about chars like / in hook names, not about other languages. Beside this using only the latin letters a-zA-Z does not restrict you to the english language but to languages which uses the latin script. With the unicode engines you naturally would have a larger set of letters available, but if you want to support also pdftex you have to restrict the letters to the set that is safe in pdftex.

Udi-Fogiel commented 3 months ago

Well at first the issue here is about chars like / in hook names, not about other languages.

Sure, but i wasn't referring to this issue, but to Frank's remark about what is officially supported.

Beside this using only the latin letters a-zA-Z does not restrict you to the english language but to languages which uses the latin script.

Yes, true, thanks for the correction. Although it is a restriction even if we only consider the Latin script (accents are not listed in [a-zA-Z]).

With the unicode engines you naturally would have a larger set of letters available, but if you want to support also pdftex you have to restrict the letters to the set that is safe in pdftex.

Ok, but the situation is that sometimes it feels more natural for me to name some environments, or even commands, using letters from my native language (using unicode engines). Given that the letters I used are of category code 11 in the formats I used, I thought there shouldn't be a problem. But Frank's statement can be interpreted as "these documents may stop compile in the future, no warranties".

It really depends on how to interpret "official support". If the meaning of "unicode letters in environment names not being officially supported" is that there is no guarantee for continuous compatibility, then it should really be clarified. If it means that LaTeX does not officially support that because it does not work with certain engines, that's another story.

u-fischer commented 3 months ago

If the meaning of "unicode letters in environment names not being officially supported" is that there is no guarantee for continuous compatibility, then it should really be clarified. If it means that LaTeX does not officially support that because it does not work with certain engines, that's another story.

The letters [a-zA-Z] are used everywhere in LaTeX for command names. So they must be safe in such names and also in key names, environment names, hook names etc. This is quite a core requirement and can be changed only in quite controlled environments. If a document breaks because it (or a package it loads) messes around with these chars it is clearly a user error or package bug.

Other "letters" often can be used in names too, but with restrictions. E.g. with pdflatex \UseHook{grüße} and \begin{grüße}\end{\grüße} do work -- as long as your document is utf8 encoded and as long as you don't try to use them in the hyperref bookmarks. And in an unicode engine \UseHook{grüße} and \begin{grüße}\end{\grüße} do work -- but not if the document or some package contains something that makes the ü active and "different" like \usepackage{newunicodechar}\newunicodechar{ü}{hallo}. So if you want to use letters outside [a-zA-Z] in command or environment names you need to check that nothing in your document (packages or your own code) interferes as nothing in LaTeX ensures that it always work.