jiggzson / nerdamer

a symbolic math expression evaluator for javascript
http://www.nerdamer.com
MIT License
514 stars 82 forks source link

Add nerdamer.convertFromLaTeX #426

Open Happypig375 opened 5 years ago

Happypig375 commented 5 years ago

So that any of these editors can be used with Nerdamer.

jiggzson commented 5 years ago

Yeah, I was considering adding this as and additional add-on. I may decide to strip Guppy of everything but the parser and go that route. Thanks for the suggestion.

tecosaur commented 5 years ago

I've recently started writing a PR for a vscode extension LaTeX-Worskhop for which I'd like to add the ability to go from LaTeX math statements → nerdamer evaluation → LaTeX math statement.

Surfice to say the ability to use LaTeX input would be rather handy; I was planning on writing it from scratch so if theres anything I can do to speed this development along, or improve it just let me know :grin:

jiggzson commented 5 years ago

@tecosaur, thank you for letting me know. It's always nice to know how the library is being used. I started writing a LaTeX importer utilizing the existing parser and I feel like I've had some fairly decent results but I always welcome any suggestions or contributions. I guess it would be helpful to know which functions or use cases to focus on.

Happypig375 commented 5 years ago

Ideally at least all results from nerdamer.convertToLaTeX should be consumable for nerdamer.convertFromLaTeX.

jiggzson commented 5 years ago

@Happypig375, sounds good. Here's an interesting one. lim(x, x, Infinity)+1 outputs the exact same TeX as lim(x+1,x,Infinity). What am I missing?

tecosaur commented 5 years ago

@jiggzson I just had a quick squiz at your commit from 3 days ago that added basic support, and correct me if I'm wrong but are you supporting inbuilt nerdamer functions (e.g. \\sin to sin() and then just writing if statements for latex to ascii substitutions here?

https://github.com/jiggzson/nerdamer/blob/b7bae8427ed1e770a72e5b3028369ab43eb5969c/nerdamer.core.js#L8478-L8489

(this is particularly with regards to the \\cdot line)

If so, I'm liable to rewrite this to take an object representing ascii equivilents to LaTeX codes and loop though all the substitutions instead.

Example Object

{
    '\\times': '*',
    '\\int': 'integral(#1)'
    etc.
}

On the note of the second line of my example object - what's the current provision for statements such as: \\int x dx? At the moment I see TypeError: dx[0] is undefined

jiggzson commented 5 years ago

@tecosaur, you are correct. Keep in mind that the current implementation is just a very rough draft put together over my lunch break. I was curious to see if it would work and so far it seems promising.

If so, I'm liable to rewrite this to take an object representing ascii equivilents to LaTeX codes and loop though all the substitutions instead.

Go for it. This instance was just a test but looping through objects would definitely be better. This works fine for "keywords" and the majority of functions but not for functions such as integral, limit hence the if statements.

Here's my logic for the current approach. Declare \ as a prefix operator. This way it comes back as [operator, variable]. Another pass can be performed as you mentioned to make substitutions and reorganize arguments of certain functions. The back slash can just be bypassed and the variable substituted. Most functions just become name(arguments) etc. You can use console.log(_.pretty_print(raw_tokens)) at the beginning of LaTeX.parse to see what your input looks like.

This {x}^{2} \cdot \mathrm{max}\left(a,b\right) - 3 outputs this ((x), ^, (2), \, cdot, \, mathrm, (max), \, left, (a, ,, b, \, right), -, 3). The problem becomes functions such as int and limit since the arguments can vary. int is used for both integrals and definite integrals so I don't know if a blind substitution will work.

I'm not married to any particular way of doing things so if you find another solutions more efficient, then let's go that route. I just took a swing at it to see where it goes.

Happypig375 commented 5 years ago

I think my Maths teacher told me that if limits contained addition or subtraction they must be parenthesized. Therefore, lim(x+1,x,Infinity) should be \lim_{x\to\infty}\left(x+1\right).

tecosaur commented 5 years ago

@jiggzson the more I think about it I think the approach should be split up into two-ish major components.

1

A function that reads a string and extracts general LaTeX-y statements.

Would take a form

\\command[optional_argument_1][optional_argument_2][...]{main_argument_1}{main_argument_2}{...}

And probably

\\command[optional_argument_{lower_limit}^{upper_limmit}

2

A method to convert identified LaTeX functions into ascii / nerdamer equivalents.

jiggzson commented 5 years ago

@Happypig375, I'll have to look and see if the libraries you mentioned at the beginning follow that rule. That would be awesome.

@tecosaur, that sort of what's happening right now isn't it? The only difference is that in part 1 the \ acts as a bypass. After that part 2 happens which is nothing more than a loop which filters out and replaces commands. The whole thing then gets "stitched" back together. Unless I'm misunderstanding you.

tecosaur commented 5 years ago

@jiggzson I'll describe the example that made me think that point one isn't fully functional at the moment.

Regarding nerdamer.convertFromLatex( ... ).toTeX()

\\sqrt{4} is converted to 2

\\sqrt[3]{8} becomes something like sqrt3 \\cdot 8 when it should be 2

tecosaur commented 5 years ago

In a similar vein, here's another example showing why I think there's a need found a more robust implementation of this.

nerdamer('sum(2x,X,1,5').toTeX() = "30"
nerdamer.convertFromLaTeX('\\sum_{x=1}^5 2x').toTeX() = "x=1"

If the implementation would get \\sum as a LaTeX statement and extract the limits and text afterwards and pass it to the converter I imagine we could (once that's in place) fairly easily turn than into sum(2x,x,1,5).

I have an idea for the implementation of the second part which I'll get started on tonight. I'll let you know when I have a working prototype (should be soon).

jiggzson commented 5 years ago

\sqrt{4} is converted to 2

That behavior is related to nerdamer. The square root of a perfect square is always evaluated.

\sqrt[3]{8} becomes something like sqrt3 \cdot 8 when it should be 2

I see what you mean.

I have an idea for the implementation of the second part which I'll get started on tonight. I'll let you know when I have a working prototype (should be soon).

Sounds exciting. Keep me posted.

Thanks!

tecosaur commented 5 years ago

\sqrt{4} is converted to 2

That behavior is related to nerdamer. The square root of a perfect square is always evaluated.

To me that's desirable behaviour :grin: I'd much rather 2 than root 4.

I'm also creating a series of tests/assertions to be used as a benchmark/scoring/progress system of sorts. So my attention is divided, but you'll get two things for the price of one :stuck_out_tongue:

tecosaur commented 5 years ago

Update - Testing

I've written up a few tests using the mocha framework (bdd) and here's the current 'report card'.

image

I think we have room for improvement.

I'll add a few more and create a PR with this tommorow if you like.

jiggzson commented 5 years ago

@tecosaur, sounds good. Will your PR include functions and a fix for the nthroot issue as well? Also, a number of those items are formatting commands? Those don't really map to anything in nerdamer correct?

tecosaur commented 5 years ago

@jiggzson Thought I'd put what I have so far up.

The reason why I've put 'formatting commands' such as \bigg up is because right now they mess up the output, which IMO - they shouldn't. See the example below.

2 \\bigg(1+1\\bigg)

should give "4" but instead gives

2 \\bigg(1+1\\bigg)

ATM that's just the unit tests (well, more of a draft for some unit tests), but code for even more tests, and making more of those test pass (or at least working toward that) should come along too in due course :)

tecosaur commented 5 years ago

@jiggzson I'm somewhat lost with regards to something and I'm hoping you can tell me what I'm missing.

I'm trying to resolve the first set of test cases I have defined (functions that are currently \\mathrm{func} that should be \\func, so replaced the current return at the end of parser.toTeX(...) with the following:

            const InbuiltLaTeXFunctions = ['arccos', 'cos', 'csc', 'exp', 'ker', 'limsup', 'min', 'sinh', 'arcsin', 'cosh', 'deg', 'gcd', 'lg', 'ln', 'Pr', 'sup', 'arctan', 'cot', 'det', 'hom', 'lim', 'log', 'sec', 'tan', 'arg', 'coth', 'dim', 'inf', 'liminf', 'max', 'sin', 'tanh']
            const replaceInbuiltFunctions = (s) => s.replace(new RegExp('\\\\'+`mathrm{(${InbuiltLaTeXFunctions.join('|')})}`, 'gm'), '\\$1')

            return replaceInbuiltFunctions(TeX.join(' '));

However, this doesn't seem to have changed anything...

Update I've descovered it's not parser.toTeX I wanted but LaTeX.value.

tecosaur commented 5 years ago

Other changes I'm making are sucessful. I'll create seperate pull requests so that different sets of changes can be reviewed seperately,

tecosaur commented 5 years ago

Previous State of Affairs

Current State of Affairs

This is with the three PRs above being accepted image

tecosaur commented 5 years ago

@jiggzson A question.

With LaTeX new operators and commands can be defined via methods such as

\DeclareMathOperator{\sech}{sech}

and

\newcommand{\dd}{\ensuremath{\mathrm{d}}}

In my the vscode extension I plan on trying to identify such lines and add them to some sort of config. I would imagine some users would also find this quite useful as they would be able to add their own commonly used substitutions.

I'm trying to locate the relevent section for this sort of config and thought it could be a good idea to ask you.

tecosaur commented 5 years ago

Also another issue I've encountered (can be seen in this image)

image

I don;t think this is an issue with 'times' as 2 times 2 producing 4 doesn't seem too outlandish, however other than LaTeX not doing this, there are far more commands such as big and left and pm where I doubt it would be desired behaviour.

I imagine the best thing to do would to add a 'flag' of sorts and have it so that it only performs the replacement if it is set.

https://github.com/jiggzson/nerdamer/blob/b7bae8427ed1e770a72e5b3028369ab43eb5969c/nerdamer.core.js#L8467-L8505

jiggzson commented 5 years ago

@tecosaur, do you think it's related to this (7bd13d2) commit? With that commit I added support for word operators but I forgot to remove my test operator times.

tecosaur commented 5 years ago

Hmmm. I'm not sure what would be easiest to implement.

  1. Determine if slash was beforehand
  2. Add flags
  3. Concanate slash with command

@jiggzson Since you've written this, I thought you might have some ideas regarding this.

Notes

I don't think it's related to a times specific commit, here's why:

image

image screenshot from 2019-01-15 22-03-41

jiggzson commented 5 years ago

@tecosaur, I'm in the same boat as you. I'm taking ideas as well. My only suggestion was to cut time by reusing the existing tokenizer. I figured we can do this since quite a bit of the LaTeX is just formatting and can be discarded. As I mentioned before the idea is then to declare \ as an operator and then feed the string to the Parser.tokenize, apply a filter pass to re-arrange the tokens, and then glue it back. Let Parser.parse worry about precedence etc.

Example

nerdamer.convertToLaTeX('integrate(x,x)');
// '\int {x}\, dx'

Parser.tokenize will then generate (\, int, (y), dx). The brackets denote an array. The slash can be discarded, int can be substituted for integrate, after int comes the function neatly in an array which can be fed back to LaTeX.parse to make sure that's formatted correctly, and dx can be stripped of the d.

If we look at \\frac{1}{2}", this produces (\, frac, (1), (2)). When encountering frac we know that the following 2 array are just divided by each other.

I don't know if it's more efficient to just write everything from scratch or to go the proposed route. It just seems like starting from scratch seems like a lot of work for a method that really just needs to be able a few cases.

tecosaur commented 5 years ago

I have an idea. I'll get back to you in a few minutes.

tecosaur commented 5 years ago

Here's my idea.

            // add slash info
            for (let tokenIndex = 0; tokenIndex < raw_tokens.length - 1; tokenIndex++) {
                const token = raw_tokens[tokenIndex];
                if (raw_tokens[tokenIndex+1].type === 'VARIABLE_OR_LITERAL') {
                    if (token.type === 'OPERATOR' && token.value === '\\') {
                        raw_tokens[tokenIndex+1].command = true;
                    } else {
                        raw_tokens[tokenIndex+1].command = false;
                    }
                }
            }

The idea works, but it introduce issues due to it being in the before the filter function. I'll investigate putting it into the filter.

jiggzson commented 5 years ago

Will you then be handling the commands on another pass? Another risk you run is that sometimes the \ can denote a space or operator but you can test for that as well.

The way the filter pass currently works is by rebuilding the tokens array. If it sees big or \ or left etc. it just doesn't add it and continues. If it encounters a function that needs reformatting, it looks ahead and and adds it in the correct order on the new stack. If you find your idea easier then play around with it and see what results you get. If it works go for it.

tecosaur commented 5 years ago

I had an issue with nested commands, however that issue is now fixed, and I think I have a robust solution.

[in filterTokens]

if (typeof token.command === 'undefined' &&
                        i>0 && ['VARIABLE_OR_LITERAL', 'FUNCTION'].includes(token.type)) {
                            if (tokens[i-1].type === 'OPERATOR' && tokens[i-1].value === '\\') {
                                token.command = true;
                            } else {
                                token.command = false;
                            }
                        }

Let me know if you can see any issues.

tecosaur commented 5 years ago

A few things.

Once again did something funny with the last PR, so you got a bonus change.

The Travis CI error with the second commit seems to be something funny with Travis? I did the full test suite on my machine and none of the tests (except for the in-progress latex import functionality tests I added) failed. Edit: Found some git-added text, removed it, everything fine now

Current State of Affairs

image image

If you were beginning to think it looked like we were almost there:

image

If you (or anyone else who's interested - @Happypig375 ?) have any ideas for good test cases, please send me a PR at https://github.com/tecosaur/nerdamer/tree/dev and I'll be sure to include them :)

jiggzson commented 5 years ago

@tecosaur, sorry for the late reply. I've been on the go since yesterday. That scorecard is looking better and better. I haven't had a chance to look at the PR's just yet but just a thought. Would you want to combine them in one PR? This way I can review and merge everything at once. As far as test cases, let me give that one a little thought. I'll go down the list of supported nerdamer functions and operations. As @Happypig375 mentioned, that should be the absolute minimum requirement for convertFromLaTeX.

tecosaur commented 5 years ago

If you could provide, or at least direct me to a list of nerdamer commands and accepted input format, that could be quite handy :smiley:

If you'd preferer I can do one PR, but I'm also happy continuing as is. Let me know what you'd like. Edit: I'll change to one big PR.

Also

  1. Since the number of functions which take arguments through _{} and ^{} are few (sum, prod, int) I'm considering just writing special cases for all of those.

  2. Then with regards to operations based on matched pairs (e.g. \int and dX for integrals, \begin{bmatrix} and \end{bmatrix}, and \lfloor and \rfloor), do you have any good ideas for dealing with those?

  3. I'm not sure how to best go about adding a natural log function. I considered just doing something along the lines of ln = n => log(n, Settings.E), but I'm concerned that would mess up the symbolic math, particularly calculus involving natural logs. Would you have any ideas?

tecosaur commented 5 years ago

@jiggzson Re: 'The Big PR', let me know if I've done anything I'd want to undo/change to nerdamer.core.js due to changes I only just merged from you. I think there may be one or two things but I haven't checked properly yet.

jiggzson commented 5 years ago

@tecosaur, thanks. I'll take a look at it and let you know. You can get the list of currently supported functions by running.

Object.keys(nerdamer.getCore().PARSER.functions)

The ln instead of log issue seems to come up a lot and I'm strongly considering implementing the option to override this. Maybe something like nerdamer.set('USE_LN', true) or something along those lines.

tecosaur commented 5 years ago

Regarding ln vs log I have some thoughts.

  1. ln should be a function that represents log with base e
  2. log should also be a function and should represent log to either base e or 10
  3. log_b/log( ,b) should also be a function that represents log with base b

Specifically, say that log(value, base) gives the obvious result

ln(v) = log(v, 2.71828...) log(v) = log(v, 2.71828...) or = log(v, 10) (arguments for both, this could be an confiuration option) lob(v, b) = log(v, b)

My concern is with the specialsymbolic behaviour of ln with integrals and derivatives.

tecosaur commented 5 years ago

Seperately, I'd be interested to know the purpose of this code in the LaTeX parser function:

            parser.setOperator({
                precedence: 8,
                operator: '\\,',
                action: 'slash_comma',
                prefix: true,
                postfix: false,
                leftAssoc: true,
                operation: function(e) {
                    return e; //bypass the slash
                }
            });

Also, regarding 9ecaabfd9d0a9c9c49c7bc34462891b6901d1f69. I think that may need reverting as now I see this:

image and image and image and image

While it is definately possible that something I've done has not 'mixed well' with your commit, I know all those tests were passing before I pulled your changes.

jiggzson commented 5 years ago

@tecosaur, you can leave it as-is. I think I know what I might have broken. I'll try to get to it tomorrow, latest this weekend.

jiggzson commented 5 years ago

@tecosaur, I'm trying to find the last commit on your dev branch were all the test were passing so I can isolate what I did that broke your tests? I've gone back as far as f1b706f but the test appear to be failing at that point still. Can you recall the commit? If not it's no biggie. It would just make the problem easier to isolate.

Regarding the code snippet you referenced above. I'm trying to separate the expression parser from the LaTeX parser. We'll have a little more freedom to make big changes. That snipped just makes it so the tokenizer will recognize the slash as an operator.

tecosaur commented 5 years ago

Here's what I've done (for my fork, in console log form) hopefully this helps.

$ git checkout dev
$ npm test

> nerdamer@0.8.4 test <path>/nerdamer
> jasmine

184 specs, 34 failures, 2 pending specs

$ git checkout cb2ff745e1431d907cfb879c0027bf53ba02b5ee
$ npm test

> nerdamer@0.8.4 test <path>/nerdamer
> jasmine

180 specs, 16 failures, 2 pending specs
tecosaur commented 5 years ago

Also regarding my question from earlier, I'm particularly asking about the purpose of these lines

                operator: '\\,',
                action: 'slash_comma',

And why \\, is it's own command, as all it's responsible is for a 3/16ths of an \\quad space (1em).

jiggzson commented 5 years ago

The purpose of these lines is to get the tokenizer to recognize \ as an operator. The slash_comma can be removed.

And why \, is it's own command, as all it's responsible is for a 3/16ths of an \quad space (1em).

I'm not sure I understand what you mean

I'm going to merge #436 and let's try to get those LaTeX_twoWay.spec tests to pass again. I can't seem to find any commit at which all the tests pass. I see some minor modifications to the core but the majority of the changes seem to be with the spec file.

tecosaur commented 5 years ago

I'm not sure I understand what you mean

\, is a LaTeX horizontal spacing command that's all. So I was confused to see it there. Sorry for the badly written sentance.

I can't seem to find any commit at which all the tests pass

That's because I wrote the test as a checklist-in-progress of features that should be supported. They don't all pass because we're not there yet. We were closer in 44ebd7927bd0e4bcda3c7adbd8528bf01912007f than 9ecaabfd9d0a9c9c49c7bc34462891b6901d1f69 though :stuck_out_tongue:

tecosaur commented 5 years ago

@jiggzson Any luck fixing it?

Also, a thought regarding brackets/parenthesis and vectors - why not (behind the scenes) treat all scalars as 1d vectors and vice versa (yes, really).

I know this might sound crazy but I think it should:

jiggzson commented 5 years ago

No I haven't had a chance to work on this.

why not (behind the scenes) treat all scalars as 1d vectors and vice versa (yes, really).

Please enlighten. I have no problem going the unorthodox route at all. I have many times in the past. Can you give me an example of what you mean?

tecosaur commented 5 years ago

Here's an example (SageMath) image

Quite a few functions should function identicly, i.e. addition, multiplication etc. However I can imagine this could cause a lot of 'if 1d' type statements to appear within function defenitions that don't naturally take vectors.

An alternative that should behave similarly is implicity converting all 1d vectors to scalars. It would be good to get other peoples thoughts but I can't think of any potential issues with that currently.

E.g. of method

  1. User enters 3[50-(14-9)!]
  2. (14-9) is evaluated to the 1D vector (5)
  3. (5) is converted to 5 since 1D
  4. 5! is evaluated to 120
  5. [50-120] is evaluated to the 1D vector (-70)
  6. (-70) is converted to -70 since 1D
  7. 3*-70 is evaluated to -240

The only extra thing I'd see that's needed over what's outlined above is getting implicit multiplication working properly.

jiggzson commented 5 years ago

@tecosaur & @Happypig375, would this tex2max be worth looking into? With the exception of a few functions, nerdamer functions map almost directly to Maxima functions. I ran across this but haven't played around with it yet.

tecosaur commented 5 years ago

Definitely looks worth a look! I'll see if I can try it out tomorrow.

tecosaur commented 5 years ago

tomorrow = now

jiggzson commented 5 years ago

@tecosaur, any luck?

tecosaur commented 5 years ago

Unfortunately, I didn't get it to work too well. It helped with some things but broke others. Also when it broke it didn't just return the input, it raised an exception which make it look like there wouldn't be much room for post-processing. :disappointed: