ronkok / Temml

TeX-to-MathML conversion library in JavaScript
https://temml.org/
MIT License
162 stars 12 forks source link

Benchmark TEMML against wikitexvc #70

Open physikerwelt opened 1 month ago

physikerwelt commented 1 month ago

It would be interesting to benchmark TEMML against wikitexvc.

It was almost ten years ago that I developed an evaluation framework https://github.com/MaRDI4NFDI/mathpipe, and I'm a bit afraid of the effort it might be to update that.

Maybe one can cooperate on extending https://temml.org/docs/en/comparison instead?

ronkok commented 1 month ago

It sounds interesting, but there are limits on how much effort I will be willing to contribute. I would, for instance, be willing to write up something that would automate what is currently shown in https://temml.org/docs/en/comparison. I might even be willing to extend that automation with examples from https://temml.org/tests/wiki-tests and https://temml.org/tests/mhchem-tests.

That's about as far I am willing to go. Comparisons to LaTeXML or WikiTexVC would be useful but someone else would have to do it.

physikerwelt commented 1 month ago

@ronkok one of the low hanging tasks would be to name the test cases and make the result computer readable.

For us

https://gitlab.wikimedia.org/repos/mediawiki/services/mathoid/-/blob/main/test/files/mathjax-texvc/mathjax-texvc.json?ref_type=heads

and

https://github.com/wikimedia/mediawiki-extensions-Math/blob/master/tests/phpunit/unit/WikiTexVC/ParserTest-Ref.json

as well as hash based identifiers were a starting point. However, we have not settled on a particular test case naming convention. However, this might be really helpful to keep an overview of how the different solutions compare. Do you have a naming convention for the test cases you linked above?

ronkok commented 1 month ago

Do you have a naming convention for the test cases you linked above?

No, I wrote the Temml comparisons page to provide people with a way to visually compare various libraries. That page was not meant to serve as a formal set of unit tests. Temml's unit tests are elsewhere.

If we want to do library comparisons more formally, then, yes, I agree with you that a good first step would be to name the tests and perhaps rewrite them in a manner similar to unit tests.

I will caution that the criteria for acceptance will get ambiguous. Temml tries to provide good rendering for all three of the major browser engines and all math fonts. To do that, Temml often applies a class that a CSS rule can select and apply the style corrections needed to overcome one browser's shortcomings or one font's shortcomings. Other libraries will likely have no such class name, and I don't know if that should be considered a failure.

Also, the change in MathML core from attributes to CSS styles has made the MathML more verbose. In particular, Temml's matrices and arrays have a good deal of CSS applied in an effort to match LaTeX spacing and margins. Other libraries will likely output a similar set of elements but will style them differently. Does that constitute a failure? Do we need a sliding scale of criteria?

physikerwelt commented 1 month ago

I will caution that the criteria for acceptance will get ambiguous. Temml tries to provide good rendering for all three of the major browser engines and all math fonts. To do that, Temml often applies a class that a CSS rule can select and apply the style corrections needed to overcome one browser's shortcomings or one font's shortcomings. Other libraries will likely have no such class name, and I don't know if that should be considered a failure.

I don't see the main value in passing or failing tests, but rather to understand what possibilities exist to convert TeX to MathML. For us as developers when looking at bugs, I think it can be very helpful to see how other implemented that.

ronkok commented 1 month ago

I'm still trying to understand just what you are looking for. If no acceptance criteria are contemplated, then it seems that you are requesting the MathML of the comparison page to be labeled and written into JSON instead of being written into HTML.

That is certainly doable. Am I beginning to understand you?

physikerwelt commented 1 month ago

Yes.

That's about as far I am willing to go. Comparisons to LaTeXML or WikiTexVC would be useful but someone else would have to do it.

To do that, JSON would be helpful.

In addition, naming the samples would be helpful for discussions and comparison. For example

https://github.com/wikimedia/mediawiki-extensions-Math/blob/8d522c40227dd01a10dfe3d0dab8c882b130004d/tests/phpunit/unit/TexVC/ParserTest-Ref.json#L99

is the same example as https://temml.org/tests/wiki-tests test 9. I don't have a solution, but I think it would be good to discuss this problem. However, I understand that this was unclear and confusing. So maybe this can be put off to another issue.