Open stefnotch opened 1 year ago
Examples
|
makes sense0xe
is a hexadecimal number, other times it's 0 \cdot x \cdot e
The heck is my poor editor supposed to do when a user types in something like this?
Picking a(bc)
is not ideal, since that might imply that
$p:=7$
$i:=\sqrt{-1}$
$pi:=3.14$
$f(x) = pi$
would be parsed as p * i
=> Simply stopping as soon as I encounter something valid isn't a good approach.
When the user starts typing $a$ and then accept the $ab$ autocomplete, then it's pretty darn obvious that they meant "ab". The tricky bit is that a user won't really use the autocomplete when typing $ax$ (meaning $a \cdot x$ )
=> Autocomplete should generate something that the parser can definitely and confidently parse
Or we could apply the parsers in reverse order.
That way, if $ln$ is a predefined rule (logarithm) and the user then defines $l := 1$ and $n := 3$, then... $ln$ would be parsed as $l \cdot n$
We would, however, need to pick a syntax for saying "wait, I actually mean the logarithm ln"
=> Applying parsers in reverse order seems like a legit strategy
The whole "writing two variables next to each other" deal happens quite frequently in mathematics
And it can make sense to look ahead as much as possible
When we're at any point in the parsing stage, we want to figure out what the next token for our Pratt parser is. This also means that we have to parse symbol tokens, and operator tokens, and the two can't overlap.
=> Simply taking the "next token", separated by spaces or a multiplication sign, seems like a non-ideal strategy.
=> A greedy parser (which runs all parsers and takes the longest result) is a valid strategy for finding the next token
If we use a greedy parsing approach, and the user types $a := 3$ $bc:=1$ $ab:=7$ and then writes $abc$ we'll parse $ab \cdot c$, which c being an unknown variable. Here, the greedy parser clearly fails.
=> The greedy parser won't always be able to figure out the user's intent.
Multi letter names can happen naturally when
Assignment makes parsing harder
$x := 1$ $i := 2$ $\displaystyle \sum_{xi := 0} xi ^2$
Here's a compromise option: The parser is nice and straightforward, such that when someone writes $abc$, then that's definitely "abc" as one variable.
However, the editor is also smart:tm: and will suggest autocomplete results. Like suggesting $ab \cdot c$ when you type in $ab \cdot c$. And you can accept those autocomplete results with tab or enter.
And other things like parsing derivatives, or parsing hexadecimal numbers, can be done with the simple "try out parsers until one works" approach. Or we could whip out the "try all parsers and take the longest result" option.
Other approaches to the issue above might also be viable, this warrants further investigation.
For getting a single token (like a lim sup token when you have $\limsup_{x \to 0}$), we could whip out multiple approaches
something | something else
(or) patterns, like https://stackoverflow.com/questions/14676833/combining-deterministic-finite-automataParsing chains of < <= doesn't need special treatment, since it's fine if I parse $1 < x < 3$ as $1 < (x < 3)$. And then I treat $x < 3$ as a "domain restriction".
Changing syntax should not be as expensive later on. It should just be "change an imported library version and you're done"
Regarding plugins #48
Here's a bit of interesting info regarding Pratt parsing https://github.com/zesterer/chumsky/pull/515#issuecomment-1718173403
There are entirely too many areas of mathematics for a semantics aware editor to be able to parse them all. So parsers should be extensible, at runtime. This means defining rules on the Typescript side and passing them to Rust.
As in, someone else should be able to define a few custom additions to the default grammar and add it. And it should be possible to add multiple different custom grammars at the same time.
And library imports should always have a semantic version. (Plus encourage writing upgrading scripts.) Otherwise if I pick a bad alias for a symbol, it'll be in there forever.
Good parser libraries:
(One big requirement is that I need to parse non-text objects. And the second big requirement is that I want to dynamically build parsers at runtime.)