[GSoC] `\leqq` and `\leqslant` for less-than-or-equal-to, `\geqq` and `\geqslant` for greater-than-or-equal-to

wermos commented 1 year ago

Background

The current ANTLR LaTeX parser supports using \leqq and \leqslant as the less-than-or-equal-to operator, and also supports \geqq and \geqslant for greater-than-or-equal-to operator.

While I have seen \leqslant and \geqslant used as the less-than-or-equal-to and the greater-than-or-equal-to operator respectively, I've never seen \leqq and \geqq used in this way.

Lark Parser

Since I'm rewriting the LaTeX parser in Lark, I wanted to get the community's thoughts on supporting \leqslant, \leqq, \geqslant, and \geqq symbols for denoting these operators. For now, the Lark parser doesn't support any of these extra symbols for less-than-or-equal-to or greater-than-or-equal-to. However, if this is something that people want, it can easily be added.

sylee957 commented 1 year ago

I think that making such things user configurable is the best option. But how is it difficult to make the tokens user configurable for Lark parser.

wermos commented 1 year ago

I think that making such things user configurable is the best option.

I'm not really sure what such a user-configurable system would look like. Would we pass in options to parse_latex? As of now, we haven't really implemented any user-configurable options for the Lark LaTeX parser, so we would need to think about what options we want to support, and how to go about it.

But how is it difficult to make the tokens user configurable for Lark parser.

That's a great question.

One of the main reasons that we moved to Lark is because it allows the user to dynamically extend the grammar at runtime. In this case in particular, the user can do something like

%import .latex (LTE)
%extend LTE: "\leqslant" | "\leqq"

by using the %extend feature of Lark.

As of now, this is not yet supported because we have hard-coded the path to the Lark grammar.

That being said, I don't think there is any debate about the definitions of \leqslant and \geqslant, so the issue is mainly about the meanings of \leqq and \geqq.

sylee957 commented 1 year ago

Although it's a bit of off-topic, usually, I find parser combinators to be more user configurable than parser generators though. Parser combinators like parsec allows users to import and extend the parsers programmatically, even the recursive grammar rules.

and use programming languages itself as modules, and for the parser generators which often have to reinvent its own system of language of grammar specifications, they fail a lot of things like having good module system or type system that is built on top of generic programming languages.

Pyparsing should be equivalent to parsec in Python, but I haven't investigated if that has good type system because it is quite a seasoned project, but I anyway like using ts-parsec for example in typescript.

I just believe that users don't need that much 'features' and should be happy with univariate function parse_latex() to do 99% of things for them, and if it is 1% of things that get actual needs of configuration, we shouldn't do it, especially if you are not familiar with type theory, combinators, dependency inversions, ... such that you can make very general one.

sympy / sympy

[GSoC] `\leqq` and `\leqslant` for less-than-or-equal-to, `\geqq` and `\geqslant` for greater-than-or-equal-to #25419

Background

Lark Parser