sile-typesetter / sile

The SILE Typesetter — Simon’s Improved Layout Engine
https://sile-typesetter.org
MIT License
1.67k stars 99 forks source link

TeX-like phi, varphi and friends... #2134

Closed Omikhleia closed 3 weeks ago

Omikhleia commented 1 month ago

Formula: \phi\varphi

Expected: ϕφ

Observed:

Honestly?

diff --git a/packages/math/unicode-symbols.lua b/packages/math/unicode-symbols.lua
index 830a718c..5f140a8a 100644
--- a/packages/math/unicode-symbols.lua
+++ b/packages/math/unicode-symbols.lua
@@ -2583,7 +2583,8 @@ symbols.rho = "ρ"
 symbols.sigma = "σ"
 symbols.tau = "τ"
 symbols.upsilon = "υ"
-symbols.phi = "φ"
+symbols.phi = "ϕ"
+symbols.varphi = "φ"
 symbols.chi = "χ"
 symbols.psi = "ψ"
 symbols.omega = "ω"

Wait... Same kind of issue for epsilon... Our epsilon is ε = the wrong one, where TeX epsilon is ϵ and varepsilon ε

Our theta is the correct one θ... But we don't have the vartheta ϑ... Our rho is correct ρ... But we don't have the varrho ϱ... And there are yet other var-symbols...

Surely, it would be so simple to add/change (as show above), so there's must be some rationale behind the current implementation that eludes me?

OlivierNicole commented 1 month ago

Surely, it would be so simple to add/change (as show above), so there's must be some rationale behind the current implementation that eludes me?

Not really. I wrote this list of symbols by hand using the Greek layer of my keyboard distribution, so it shouldn’t be taken as an authority on any grounds.

For what it’s worth, Wikipedia says

Like other Greek letters, lowercase phi (encoded as the Unicode character U+03C6 φ GREEK SMALL LETTER PHI) is used as a mathematical or scientific symbol. Some uses[example needed] require the old-fashioned 'closed' glyph, which is separately encoded as the Unicode character U+03D5 ϕ GREEK PHI SYMBOL.

And indeed in both the Greek script block and all bold/italic/bold-italic/etc. variants of φ in the Mathematical Alphanumeric Symbols block, φ is at its expected position in the Greek alphabet whereas ϕ is singled out at the end. So it looks like, in a way, \phi is in fact \varphi

Typst’s choice seems to adhere to the Unicode view, that makes sense to me. I don’t see much value in adhering to the old TeX convention except, of course, to ease the transition from TeX.

Omikhleia commented 1 month ago

... to ease the transition from TeX.

I don't think the same way. Most lightweight markup language engines claim support for TeX-like. I'have my Djot and Markdown inputters in mind, but we could also target native MediaWiki for instance (Wikipedia is where I pick most of the math formula I test with, 'cause I'm lazy). So it's a diffferent point of view:

OlivierNicole commented 1 month ago

I don’t necessarily disagree, but as you probably know the TeX-like input differs from (La)TeX in various ways, because it’s only a façade covering MathML. We can make the surface use look the same, but when macros come in the picture many corner cases will have different semantics since TeX is a macro expansion engine and SILE is a modern language with an actual syntax and semantics.

Omikhleia commented 1 month ago

SILE is a modern language with an actual syntax and semantics.

I'm afraid I don't understand what this sentence means.

alerque commented 1 month ago

At first read my gut instinct is to go with whatever is most expected in the MathML-adjacent world for this. If that is LaTeX's interpretation of symbols names then so be it.

SILE is a modern language with an actual syntax and semantics.

I'm afraid I don't understand what this sentence means.

TeX as a language gets parsed and processed bit by bit. Pragmatically this allows packages or macros or whatever to fundamentally change how the input language is parsed for everything after them. This means you can write macros that do things like "eat" text outside of the content passed to the macro. This is part of what makes TeX so confusing and finicky. By contrast SILE's input language has a fixed grammar that in only parsed once and there is no way for the content of a document to change how the document itself is parsed. You can change how it is processed, but not the grammar that is is parsed with. Of course we do allow the parser to be tampered with before loading a document, but not during.

Omikhleia commented 1 month ago

But this is totally unrelated with the discussion at stakes in this issue, isn't it? Or am I missing the point?

OlivierNicole commented 1 month ago

The link is that you want to have TeX-compatible syntax, and that it is very difficult to mimic what TeX would do in all cases, without reimplementing TeX’s Turing-complete parser, so perfect compatibility is not achievable.

Omikhleia commented 1 month ago

The link is that you want to have TeX-compatible syntax, and that it is very difficult to mimic what TeX would do in all cases, without reimplementing TeX’s Turing-complete parser, so perfect compatibility is not achievable.

Ah, I see where the misunderstanding lies.

I'm not advocating for full TeX-math syntax, which I obviously agree would be impractical to fully replicate. I do know that full TeX-math compatibility, given its complexity and Turing-complete nature, is indeed impossible.

Instead, what we should aim for, guided by the principle of least astonishment, is support for the subset of "common" TeX-math syntax used in the vast majority of papers, Wikipedia, and similar sources. Covering the "most commonly" used subset of TeX-math syntax is both reasonable and sufficient for most users' needs.

In other words, our "TeX-like math" support should be comparable to what is provided by Pandoc (no more, no less).

pandoc -t html --mathml -o ml.html
$$\lim_{n\to\infty} \frac{1}{x}$$ 

$$\epsilon\varepsilon\theta\vartheta\pi\varpi\rho\varrho\sigma\varsigma\phi\varphi$$

$$\sqrt[3]{x+y}$$

$$R^n \overset{A}{→} R^m$$

Pandoc generated MathML rendered in Firefox:

image

If SILE's math package claims to offer a TeX-like math comparable syntax, we should aim for it to be at least as robust as Pandoc’s implementation. Otherwise, users will likely avoid it for valid reasons, and the effort put into implementing it will be wasted.

As seen on the screenshot: The rationale for #2131 and #2133 is exactly the same as for those little var-greek symbols at stake here. (EDIT: and #2120, let's not forget it. If \sqrt[3]{...} is the TeX-like math way, then so be it too in our implementation).

Frankly, I had assumed this understanding was shared by all of us, your remark took me a bit by surprise. Nevertheless, I’m happy to clarify and ensure we're all on the same page.

khaledhosny commented 1 month ago

Re-Unicode, φ is the regular Greek phi, it can have an open or closed design up to the font, while ϕ is phi symbol specifically encoded for use in technical text and must be a closed design. So it makes sense for a math input language to default to the symbol phi not the regular text phi since math is technical notation not a regular text, and this is in no way against Unicode definition. The same goes for other var* Greek symbols.