Parsing and syntax highlighted rendering

stefnotch commented 1 year ago

Parsing MathLayout and giving me another tree (and not directly MathJson, because I can't always parse everything)

Should be context free (as much as possible)
Probably use a pratt parser
And maybe https://lukasatkinson.de/2015/marpa-overview/

And the closely related task:

Rendering with the parsed info

Notice that selections and the caret are explicitly something different. I don't want to repeatedly rerender the formula every time I move the caret.

Semantic annotations

Which definition does something use: euler's e vs variable e, ...
Info about the operator precedence when inserting certain symbols (e.g. lim / should become a fraction with the entire lim in top)
Info about valid stuff to insert/placeholders (e.g. lim _ places something below the entire lim instead of making it a normal msub!)
- Note: integrals don't automatically get a msubsup afterwards, because that would break the "insert integral", "hit backspace" flow.
Hover info for a symbol (plus means...), or a range of symbols (sin means...),
Digit grouping (range of symbols that should be rendered in one tag and also insert thin spaces)
Implicit multiplication (insert mo between existing stuff when rendering - cannot be in an already grouped range, aka non-overlapping range)
Brackets and operators with a higher precedence (range from bracket start to end, or range including the operands, and then create mrows)
Stretch for brackets (like for the brackets around a matrix)
Align, like align equals signs below each other. Use mtable for that. https://www.hawkeslearning.com/Accessibility/guides/mathml_content.html
Less ambiguous power tower rendering

Should have a clear mapping back to the MathLayout (well, specifically the rows and offsets - with caret navigation, one can reach every offset)

Non-semantic annotations

Bold, italics, underline for text
Colored backgrounds
Arrows and some text pointing at a formula
Annotations like https://www.reddit.com/r/LaTeX/comments/v7o21b/how_to_annotate_equations_in_beamer/ would be cool, tricky part is that the normal way of renewing this with a bunch of absolutely positioned elements would not survive a reflow/line wrap. Also tricky is that two arrows start from the same element and end up at two different elements. (Not tree-like). At least screen readers are easy, because I can just generate a separate representation for them.

Should follow the MathLayout tree. Should never have overlapping ranges (but smaller ranges inside larger ranges is fine). Multiple nested equally large ranges is also fine, like "annotation border > annotation blue background > annotation padding > actual math formula".

Annotation example

<math display="block">
            <mover>
              <mrow>
                <mi>x</mi>
                <mo>+</mo>
                <mo>...</mo>
                <mo>+</mo>
                <mi>x</mi>
              </mrow>
              <mover>
                <mo>⏞</mo>
                <mrow>
                  <mi>k</mi>
                  <mspace width="0.1111111111111111em"></mspace>
                  <mtext>times</mtext>
                </mrow>
              </mover>
            </mover>
          </math>

Open question: Writing 123.45 and then highlighting (aka adding an annotation) to .45 should be possible. Do I slap that into the rendered mathml, and if so, how?

Special annotations

Readonly (entire range becomes read-only)
Anti-readonly (entire range becomes writeable again)

stefnotch commented 1 year ago

Since passing a tree-like structure to a parser doesn't seem to work out, we'll opt for the "Copium" option and will settle with "parsing every row separately".

Though, when parsing a row, and then encountering an element (like a under), we do get to choose which parser to call. In some contexts, it makes sense to use a parser that knows what a lim is, in other contexts, it doesn't.

stefnotch commented 1 year ago

Also interesting https://www.brics.dk/metafront/metafront.pdf

stefnotch commented 1 year ago

Parsing tasks

[ ] Pass definitions to Rust parse context
[x] Postfix operators https://matklad.github.io/2020/04/13/simple-but-powerful-pratt-parsing.html#Bells-and-Whistles
[x] Prefix operators
[ ] Brackets
- [x] Normal brackets
- [ ] Abs
[x] Quotes - strings https://github.com/stefnotch/mathml-editor/issues/21#issuecomment-1422725139
[x] Numbers
[x] Fractions
[x] Sub, Sup
[x] Table
[x] Root, Under, Over, ...
[ ] Ranges should be filled
[ ] Inequalities could be flattened, or we could ignore it https://github.com/cortex-js/compute-engine/issues/25 https://github.com/stefnotch/aftermath-editor/issues/41#issuecomment-1494824857

stefnotch commented 1 year ago

Greedy parser for multi-character symbols

Would a greedy parsing approach work well enough for all special constructs? Like, I encounter a symbol, so I call all possible parsers for that, and then take the longest result.

That greedy parser gets used to get the next token, and then a pratt parser is used for the standard operator precedence, associativity, etc.

e.g.

The symbol parser says that the d is just a plain symbol.
Meanwhile the d/dx parser parses the entire thing, and wins, because it returned a longer parse result. And after getting the entire d/dx, I treat it as a single operator token. d/dx would be a unary operator that binds weaker than the multiplication sign, but binds stronger than the addition sign.

Optimisation

For the greedy parsing, we introduce the concept of a "token hash". We then construct a trie like

// TODO: Immutable variant of this, aka a HAMT
/**
 * This trie takes you to an approximate position, however there might be multiple definitions with overlapping token-hashes.
 * So once we've found something, we need to check all MathDefinition.tokens again, to make sure they're exact matches.
 * Nested tokens are flattened for the purposes of checking the hashes 
 */
struct Trie {
  values: Vec<TokenDefinition>, // Token definitions plus an actual parser to go with them
  children: Map<Hash, Trie>, // if the next token has an appropriate hash, then we go down the trie
}

stefnotch commented 1 year ago

\ could act as escape character for

\" for when you don't want a quote
\| for when you really don't want an abs bracket
...

stefnotch commented 1 year ago

https://levelup.gitconnected.com/writing-a-custom-typescript-ast-transformer-731e2b0b66e6 maybe

stefnotch commented 1 year ago

Recursive renderer: Takes a slice of mathelements, and a single mathsemantic, and returns an array of mathml dom nodes

Even better: Virtual DOM

stefnotch commented 1 year ago

Okay, this has been split into future issues

stefnotch / aftermath-editor