Open bollwyvl opened 6 years ago
There should be a bunch of tokens for LaTeX things that don't actually affect the mathematical meaning of an expression, like \left
, \right
, \,
, and so on, and the parser should just ignore them.
I think @scopatz may also have some ideas about this.
can i work on this? Also, can you suggest how should i approach this problem.
Hi @Code-b0t!
It would be great to have some more folks looking at this, even while high-level architecture discussions are happening back on #13706... since hopefully this is a grammar-only fix, it should be good to go either way!
Like up at the head of the document, there are a number of implementations out there in the forkiverse of latex2sympy
.
Likely, what you'd have to do is, assuming you already have a working dev install:
\left
and \right
LaTeX.g4
and/or _parse_latex_antlr.py
... but it might be trivialpython setup.py antlr
pytest -k test_latex
If that all looks good, you'd be ready to PR!
Please feel free to put work in progress questions or notes here! I'll try to get back to you in a timely manner!
working dev install
Here's the official info, with links to help.
Completely unofficially, the quickest way to this would, assuming conda, be:
git clone https://github.com/sympy/sympy
conda create -n sympy-dev -c conda-forge antlr antlr-python-runtime mpmath pytest # [1]
source activate sympy-dev
cd sympy
pip install -e .
python setup.py antlr # just to test install
pytest -k latex # might as well, will give good errors on failure
@bollwyvl I tried setting up dev, but when i used this command : conda create -n sympy-dev antlr -c conda-forge antlr-python-runtime mpmath pytest # [1]
an error popped up : conda: error: unrecognized arguments: antlr-python-runtime mpmath pytest
I looked up https://github.com/sympy/sympy/blob/master/.travis.yml#L27, but couldn't find any solution. Please help, as i don't have any clue about this error.
Thanks!
Drats! I just moved antlr
in the above instructions after conda-forge
Hello @bollwyvl !
So i tried 2 approaches: 1) in first approach i changed the code in Latex.g4 to this :
L_PAREN: '\\left' | ')' ;
R_PAREN: '\\right' | '(' ;
L_BRACE: '{';
R_BRACE: '}';
L_BRACKET: '\\left' | ']' ;
R_BRACKET: '\\right' | '[' ;
`
and i got the following result :
2) in my second approach , i changed the code in Latex.g4 to :
L_PAREN: '\\left(';
R_PAREN: '\\right)';
L_BRACE: '{';
R_BRACE: '}';
L_BRACKET: '\\left[';
R_BRACKET: '\\right]';
and I got the following result :
I am not really sure , what to make up of these results. So kindly guide me further.
Thanks!!
Ah. So as mentioned above, \left and\right are content-less and optional, and they always need the actual bracket or paren immediately following. At least one of the existing solutions defined two new tokens, and then used them everywhere as an alternate construction.
Take a look at where the bracket-y tokens are being used, and see if you can extend those rules to accept either the base brackets or base brackets and left/right...
On Mon, Feb 5, 2018, 02:45 Aditya Dokhale notifications@github.com wrote:
Hello @bollwyvl https://github.com/bollwyvl !
So i tried 2 approaches:
- in first approach i changed the code to this :
L_PAREN: '\left' | '('; R_PAREN: '\right' | ')'; L_BRACE: '{'; R_BRACE: '}'; L_BRACKET: '\left' | '['; R_BRACKET: '\right' | ']';
and i got the following result : [image: screenshot from 2018-02-05 17-26-10] https://user-images.githubusercontent.com/29678141/35793008-dac6285e-0a75-11e8-93b1-d3c08a136eab.png
- in my second approach , i changed the code to :
L_PAREN: '\left('; R_PAREN: '\right)'; L_BRACE: '{'; R_BRACE: '}'; L_BRACKET: '\left['; R_BRACKET: '\right]';
and I got the following result : [image: screenshot from 2018-02-05 17-28-49] https://user-images.githubusercontent.com/29678141/35793095-33f3a08c-0a76-11e8-8462-91c54120cb46.png
I am not really sure , what to make up of these results. So kindly guide me further.
Thanks!!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/sympy/sympy/issues/14005#issuecomment-363004793, or mute the thread https://github.com/notifications/unsubscribe-auth/AACxRPQY8hCbZWjm2z0nmL74W1v1_RFiks5tRrF7gaJpZM4RsQeT .
Yes, so as you said \left and \right does not affect the result since when i changed the code to this :
L_PAREN: '(';
R_PAREN: ')';
L_BRACE: '{';
R_BRACE: '}';
L_BRACKET: '[';
R_BRACKET: ']';
this had same result as with \left and \right.
Also , i couldn't understand what you meant by this :
Take a look at where the bracket-y tokens are being used, and see if you can extend those rules to accept either the base brackets or base brackets and left/right
Can you explain this in layman's terms?
Thanks for your help and your fast responses ! :+1:
I don't see why we need to be strict about \left and \right. It is true that they have specific rules in the LaTeX grammar, for instance, \left and \right must always match each other, and they can only be followed by certain characters. But the point of the LaTeX parser isn't to detect invalid LaTeX. I don't see any downside to just creating a no-op token that matches \left
and \right
and just ignoring it. Sure it will parse invalid things like \left ( \left )
, but I don't think it's a big deal. Put another way, I don't think existence or nonexistence of \left and \right in an expression ever changes its mathematical meaning, just the way it is displayed.
On the other hand, if we are going to parse them as brackets, we should parse them correctly, according to the actual LaTeX rules.
Yes, i concur with your view that \left and \right does not change the mathematical meaning of the expression , but if displayed properly it helps the user to understand the expression in a simpler and better way.
So I think one of the forks out there has this pattern (reconstructed from memory, though does compile and pass tests):
diff --git a/sympy/parsing/latex/LaTeX.g4 b/sympy/parsing/latex/LaTeX.g4
index 5531df626..9fc52728d 100644
--- a/sympy/parsing/latex/LaTeX.g4
+++ b/sympy/parsing/latex/LaTeX.g4
@@ -25,14 +25,19 @@ SUB: '-';
MUL: '*';
DIV: '/';
-L_PAREN: '(';
-R_PAREN: ')';
-L_BRACE: '{';
-R_BRACE: '}';
-L_BRACKET: '[';
-R_BRACKET: ']';
+RIGHT: '\\right';
+LEFT: '\\left';
+
+L_PAREN: '(' | LEFT '(';
+R_PAREN: ')' | RIGHT ')';
+L_BRACE: '{' | LEFT '{';
+R_BRACE: '}' | RIGHT '}';
+L_BRACKET: '[' | LEFT '[';
+R_BRACKET: ']' | RIGHT ']';
BAR: '|';
+L_BAR: BAR | LEFT BAR;
+R_BAR: BAR | RIGHT BAR;
FUNC_LIM: '\\lim';
LIM_APPROACH_SYM: '\\to' | '\\rightarrow' | '\\Rightarrow' | '\\longrightarrow' | '\\Longrightarrow';
@@ -170,7 +175,7 @@ group:
| L_BRACKET expr R_BRACKET
| L_BRACE expr R_BRACE;
-abs_group: BAR expr BAR;
+abs_group: BAR expr BAR | L_BAR expr R_BAR;
atom: (LETTER | SYMBOL) subexpr? | NUMBER | DIFFERENTIAL | mathit;
This does let you write unbalanced things, i.e. \left (1 + 1)
which is probably not what is wanted, even though it's simpler.
To do it that way right, you'd have to look for every line that includes an L_*
and a R_*
, and add an optional definition of that token that includes LEFT
and RIGHT
, similar to how the BAR example works. Anecdotally, about the only time that sympy.latex
doesn't include \left
and \right
seems to be on exponents, i.e. x_{y}
, but I would need to do some more exhaustive research!
As for actually just dropping it on the floor:
diff --git a/sympy/parsing/latex/LaTeX.g4 b/sympy/parsing/latex/LaTeX.g4
index 5531df626..2b08bdd05 100644
--- a/sympy/parsing/latex/LaTeX.g4
+++ b/sympy/parsing/latex/LaTeX.g4
@@ -25,6 +25,9 @@ SUB: '-';
MUL: '*';
DIV: '/';
+RIGHT: '\\right' -> skip;
+LEFT: '\\left' -> skip;
+
L_PAREN: '(';
R_PAREN: ')';
L_BRACE: '{';
will drop it on the floor for real... skipped tokens can't appear in other rules, so it's definitely either/or. This also works and passes the current tests.
While looking into the matrix stuff on #14007, I did shake the cobwebs a few more things that, while not presently generated by sympy.latex
might no doubt occur in the wild:
\middle,
, \big
\Big
\bigg
, and \Bigg
which can modify any of the brackets we handle.
can immediately follow any of the above, i.e. \left.
, which denotes that it is not drawn... the example given is a function domain,
\left.\frac{x^3}{3}\right|_0^1
which we certainly don't handle yet!
Anyhow, 🍕 for thought!
That's why I think it's simplest if all these size delimiters, as well as space delimiters like \,
and things like \right .
are just completely ignored. I can't think of any instances where they would affect the mathematical content of an expression.
There are also spacing commands like \,
, \:
, \;
. I think that it should be possible to treat all of these, as well as \left
, \right
, etc. as whitespace. They should have no effect on the translated code.
Great! Perhaps, then, instead of the approaches above, we'd want:
SKIP_BRACKET_MODIFIER:
( '\\big'
| '\\Big'
| '\\bigg'
| '\\Bigg'
| '\\left'
| '\\middle'
| '\\right'
) '.'?
-> skip;
I haven't investigated the whitespace stuff, but yeah, there's some things going on that we should handle, though that could be a separate PR, as I think we'd still want the bracket stuff grouped together. Thoughts?
Can the . also follow \big and so on? I've only seen it after \left and \right.
Yes, it appears it can follow any of them: https://nbviewer.jupyter.org/gist/anonymous/e95ab5d42e1ebeeff5fa2aa039dd2cee
When a dot is included, it actually acts as the left (or right) bracket, which might complicate things.
As I understand it, a dot is used to balance the number of \left and \right. So, if you are having less \left than \right brackets for some reason (multi line equations for example), one need to balance the actual number of \left and \right using dots. Not sure how it affects this discussion though.
can I work on this issue ? I don't have much knowledge but i am confident i'll be able to learn & tackle if there will be some guidance. Thanks !
I'd like to remove easy to fix because the suggestions are outdated now, due to deprecations of antlr
Many of the
sympy.printing.latex
implementations use the full\left
and\right
notation, which are not parseable as of #13706.This has been solved on several of the forks of
latex2sympy
:We should pick one of the approaches!