ronkok / Temml

TeX-to-MathML conversion library in JavaScript
https://temml.org/
MIT License
162 stars 12 forks source link

Using the W3 character list directly instead of the KaTeX symbols.js #32

Closed fonsp closed 1 year ago

fonsp commented 1 year ago

Hey, awesome work!!

For the supported symbols, temml currently uses the forked symbols.js file from KaTeX. One suggestion is to use the W3 MathML character list directly, which includes latex commands whenever available:

https://www.w3.org/Math/characters/unicode.xml

Perhaps you could write a script that processes this XML file and generates a .js file with the symbol list?

I suspect that KaTeX needs to maintain their own subset of the MathML character set, because every single character added to KaTeX requires a bit of extra work (e.g. also additions to the font files). For Temml, you might not have this restriction, and it might save future work adding support for more characters. It also means that Temml will automatically support the full overlap of latex and MathML symbols.

ronkok commented 1 year ago

Back in 2015, a PR was submitted to KaTeX with a similar approach. The review of that PR got bogged down in discussion of a few symbols whose meaning is ambiguous. For instance, should be a relation or a text ORD?

Those discussions never resolved and the PR was never approved.

Later, I took a different approach. Instead of trying to adopt all Unicode characters in one PR, I submitted a series of PRs (example 1, example 2). Each PR addressed a much smaller subset of Unicode characters and provided evidence of how each one was used. I avoided the ambiguous characters altogether.

That approach was something that could be reviewed and those PRs were eventually approved and merged into KaTeX. Those characters are all now in Temml, as well. Temml also contains many characters that are not in KaTeX, such as upright lower case Greek.

I'm not going to try to adopt the entire W3 MathML character list in one commit. I do this for the same reasons that the early KaTeX PR was not approved. It's a topic that needs character-by-character consideration, not a bulk approach.

Having said that, it will be interesting to browse that list. There may be much in there that will eventually end up in Temml.

fonsp commented 1 year ago

Super interesting! Thanks for the background information, keep up the good work!