sympy / sympy

A computer algebra system written in pure Python
https://sympy.org/
Other
13k stars 4.44k forks source link

Handle LaTeX parsing \left and \right #14005

Open bollwyvl opened 6 years ago

bollwyvl commented 6 years ago

Many of the sympy.printing.latex implementations use the full \left and \right notation, which are not parseable as of #13706.

This has been solved on several of the forks of latex2sympy:

We should pick one of the approaches!

asmeurer commented 6 years ago

There should be a bunch of tokens for LaTeX things that don't actually affect the mathematical meaning of an expression, like \left, \right, \,, and so on, and the parser should just ignore them.

asmeurer commented 6 years ago

I think @scopatz may also have some ideas about this.

Code-b0t commented 6 years ago

can i work on this? Also, can you suggest how should i approach this problem.

bollwyvl commented 6 years ago

Hi @Code-b0t!

It would be great to have some more folks looking at this, even while high-level architecture discussions are happening back on #13706... since hopefully this is a grammar-only fix, it should be good to go either way!

Like up at the head of the document, there are a number of implementations out there in the forkiverse of latex2sympy.

Likely, what you'd have to do is, assuming you already have a working dev install:

If that all looks good, you'd be ready to PR!

Please feel free to put work in progress questions or notes here! I'll try to get back to you in a timely manner!

bollwyvl commented 6 years ago

working dev install

Here's the official info, with links to help.

Completely unofficially, the quickest way to this would, assuming conda, be:

git clone https://github.com/sympy/sympy
conda create -n sympy-dev -c conda-forge antlr antlr-python-runtime mpmath pytest # [1]
source activate sympy-dev
cd sympy
pip install -e .
python setup.py antlr # just to test install
pytest -k latex # might as well, will give good errors on failure
Code-b0t commented 6 years ago

@bollwyvl I tried setting up dev, but when i used this command : conda create -n sympy-dev antlr -c conda-forge antlr-python-runtime mpmath pytest # [1]

an error popped up : conda: error: unrecognized arguments: antlr-python-runtime mpmath pytest

I looked up https://github.com/sympy/sympy/blob/master/.travis.yml#L27, but couldn't find any solution. Please help, as i don't have any clue about this error.

Thanks!

bollwyvl commented 6 years ago

Drats! I just moved antlr in the above instructions after conda-forge

Code-b0t commented 6 years ago

Hello @bollwyvl !

So i tried 2 approaches: 1) in first approach i changed the code in Latex.g4 to this :

L_PAREN: '\\left' | ')' ; R_PAREN: '\\right' | '(' ; L_BRACE: '{'; R_BRACE: '}'; L_BRACKET: '\\left' | ']' ; R_BRACKET: '\\right' | '[' ; `

and i got the following result : screenshot from 2018-02-05 17-26-10

2) in my second approach , i changed the code in Latex.g4 to :

L_PAREN: '\\left('; R_PAREN: '\\right)'; L_BRACE: '{'; R_BRACE: '}'; L_BRACKET: '\\left['; R_BRACKET: '\\right]'; and I got the following result : screenshot from 2018-02-05 17-28-49

I am not really sure , what to make up of these results. So kindly guide me further.

Thanks!!

bollwyvl commented 6 years ago

Ah. So as mentioned above, \left and\right are content-less and optional, and they always need the actual bracket or paren immediately following. At least one of the existing solutions defined two new tokens, and then used them everywhere as an alternate construction.

Take a look at where the bracket-y tokens are being used, and see if you can extend those rules to accept either the base brackets or base brackets and left/right...

On Mon, Feb 5, 2018, 02:45 Aditya Dokhale notifications@github.com wrote:

Hello @bollwyvl https://github.com/bollwyvl !

So i tried 2 approaches:

  1. in first approach i changed the code to this :

L_PAREN: '\left' | '('; R_PAREN: '\right' | ')'; L_BRACE: '{'; R_BRACE: '}'; L_BRACKET: '\left' | '['; R_BRACKET: '\right' | ']';

and i got the following result : [image: screenshot from 2018-02-05 17-26-10] https://user-images.githubusercontent.com/29678141/35793008-dac6285e-0a75-11e8-93b1-d3c08a136eab.png

  1. in my second approach , i changed the code to :

L_PAREN: '\left('; R_PAREN: '\right)'; L_BRACE: '{'; R_BRACE: '}'; L_BRACKET: '\left['; R_BRACKET: '\right]';

and I got the following result : [image: screenshot from 2018-02-05 17-28-49] https://user-images.githubusercontent.com/29678141/35793095-33f3a08c-0a76-11e8-8462-91c54120cb46.png

I am not really sure , what to make up of these results. So kindly guide me further.

Thanks!!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/sympy/sympy/issues/14005#issuecomment-363004793, or mute the thread https://github.com/notifications/unsubscribe-auth/AACxRPQY8hCbZWjm2z0nmL74W1v1_RFiks5tRrF7gaJpZM4RsQeT .

Code-b0t commented 6 years ago

Yes, so as you said \left and \right does not affect the result since when i changed the code to this :

L_PAREN: '('; R_PAREN: ')'; L_BRACE: '{'; R_BRACE: '}'; L_BRACKET: '['; R_BRACKET: ']';

this had same result as with \left and \right.

Also , i couldn't understand what you meant by this :

Take a look at where the bracket-y tokens are being used, and see if you can extend those rules to accept either the base brackets or base brackets and left/right

Can you explain this in layman's terms?

Thanks for your help and your fast responses ! :+1:

asmeurer commented 6 years ago

I don't see why we need to be strict about \left and \right. It is true that they have specific rules in the LaTeX grammar, for instance, \left and \right must always match each other, and they can only be followed by certain characters. But the point of the LaTeX parser isn't to detect invalid LaTeX. I don't see any downside to just creating a no-op token that matches \left and \right and just ignoring it. Sure it will parse invalid things like \left ( \left ), but I don't think it's a big deal. Put another way, I don't think existence or nonexistence of \left and \right in an expression ever changes its mathematical meaning, just the way it is displayed.

On the other hand, if we are going to parse them as brackets, we should parse them correctly, according to the actual LaTeX rules.

Code-b0t commented 6 years ago

Yes, i concur with your view that \left and \right does not change the mathematical meaning of the expression , but if displayed properly it helps the user to understand the expression in a simpler and better way.

bollwyvl commented 6 years ago

So I think one of the forks out there has this pattern (reconstructed from memory, though does compile and pass tests):

diff --git a/sympy/parsing/latex/LaTeX.g4 b/sympy/parsing/latex/LaTeX.g4
index 5531df626..9fc52728d 100644
--- a/sympy/parsing/latex/LaTeX.g4
+++ b/sympy/parsing/latex/LaTeX.g4
@@ -25,14 +25,19 @@ SUB: '-';
 MUL: '*';
 DIV: '/';

-L_PAREN: '(';
-R_PAREN: ')';
-L_BRACE: '{';
-R_BRACE: '}';
-L_BRACKET: '[';
-R_BRACKET: ']';
+RIGHT: '\\right';
+LEFT: '\\left';
+
+L_PAREN: '(' | LEFT '(';
+R_PAREN: ')' | RIGHT ')';
+L_BRACE: '{' | LEFT '{';
+R_BRACE: '}' | RIGHT '}';
+L_BRACKET: '[' | LEFT '[';
+R_BRACKET: ']' | RIGHT ']';

 BAR: '|';
+L_BAR: BAR | LEFT BAR;
+R_BAR: BAR | RIGHT BAR;

 FUNC_LIM:  '\\lim';
 LIM_APPROACH_SYM: '\\to' | '\\rightarrow' | '\\Rightarrow' | '\\longrightarrow' | '\\Longrightarrow';
@@ -170,7 +175,7 @@ group:
     | L_BRACKET expr R_BRACKET
     | L_BRACE expr R_BRACE;

-abs_group: BAR expr BAR;
+abs_group: BAR expr BAR | L_BAR expr R_BAR;

 atom: (LETTER | SYMBOL) subexpr? | NUMBER | DIFFERENTIAL | mathit;

This does let you write unbalanced things, i.e. \left (1 + 1) which is probably not what is wanted, even though it's simpler.

To do it that way right, you'd have to look for every line that includes an L_* and a R_*, and add an optional definition of that token that includes LEFT and RIGHT, similar to how the BAR example works. Anecdotally, about the only time that sympy.latex doesn't include \left and \right seems to be on exponents, i.e. x_{y}, but I would need to do some more exhaustive research!

As for actually just dropping it on the floor:

diff --git a/sympy/parsing/latex/LaTeX.g4 b/sympy/parsing/latex/LaTeX.g4
index 5531df626..2b08bdd05 100644
--- a/sympy/parsing/latex/LaTeX.g4
+++ b/sympy/parsing/latex/LaTeX.g4
@@ -25,6 +25,9 @@ SUB: '-';
 MUL: '*';
 DIV: '/';

+RIGHT: '\\right' -> skip;
+LEFT: '\\left' -> skip;
+
 L_PAREN: '(';
 R_PAREN: ')';
 L_BRACE: '{';

will drop it on the floor for real... skipped tokens can't appear in other rules, so it's definitely either/or. This also works and passes the current tests.

bollwyvl commented 6 years ago

While looking into the matrix stuff on #14007, I did shake the cobwebs a few more things that, while not presently generated by sympy.latex might no doubt occur in the wild:

Anyhow, 🍕 for thought!

asmeurer commented 6 years ago

That's why I think it's simplest if all these size delimiters, as well as space delimiters like \, and things like \right . are just completely ignored. I can't think of any instances where they would affect the mathematical content of an expression.

jksuom commented 6 years ago

There are also spacing commands like \,, \:, \;. I think that it should be possible to treat all of these, as well as \left, \right, etc. as whitespace. They should have no effect on the translated code.

bollwyvl commented 6 years ago

Great! Perhaps, then, instead of the approaches above, we'd want:

SKIP_BRACKET_MODIFIER:
  ( '\\big'
  | '\\Big'
  | '\\bigg'
  | '\\Bigg'
  | '\\left'
  | '\\middle'
  | '\\right'
  ) '.'?
  -> skip;

I haven't investigated the whitespace stuff, but yeah, there's some things going on that we should handle, though that could be a separate PR, as I think we'd still want the bracket stuff grouped together. Thoughts?

asmeurer commented 6 years ago

Can the . also follow \big and so on? I've only seen it after \left and \right.

bollwyvl commented 6 years ago

Yes, it appears it can follow any of them: https://nbviewer.jupyter.org/gist/anonymous/e95ab5d42e1ebeeff5fa2aa039dd2cee

When a dot is included, it actually acts as the left (or right) bracket, which might complicate things.

oscargus commented 5 years ago

As I understand it, a dot is used to balance the number of \left and \right. So, if you are having less \left than \right brackets for some reason (multi line equations for example), one need to balance the actual number of \left and \right using dots. Not sure how it affects this discussion though.

newsid2024 commented 1 year ago

can I work on this issue ? I don't have much knowledge but i am confident i'll be able to learn & tackle if there will be some guidance. Thanks !

sylee957 commented 9 months ago

I'd like to remove easy to fix because the suggestions are outdated now, due to deprecations of antlr