posit-dev / ark

Ark, an R kernel
MIT License
159 stars 9 forks source link

Detect boundaries of parse inputs #522

Closed lionel- closed 2 weeks ago

lionel- commented 2 weeks ago

Progress towards https://github.com/posit-dev/positron/issues/1326.

parse_boundaries() uses the R parser to detect the line boundaries of:

The boundaries are for lines of inputs rather than expressions. For instance, foo; bar has one input whereas foo\nbar has two inputs.

Invariants:

Approach:

I originally thought I'd use the parse data artifacts created by side effects to detect boundaries of complete expressions in case of incomplete and invalid inputs (see infrastructure implemented in #508). The goal was to avoid non-linear performance as the number of lines increases. I didn't end up doing that because the parse data is actually unusable for more complex cases than what I tried during my exploration.

Instead, I had to resort to parsing line by line. I start with the entire set of lines and back up one line at a time until the parse fully completes. Along the way I keep track of the error and incomplete sections of the input. In the most common cases (valid inputs, short incomplete input at the end), this should only require one or a few iterations. The boundaries of complete expressions are retrieved from the source references of the parsed expressions (using infrastructure from #482) and then transformed to complete inputs.

Supporting infrastructure:

lionel- commented 2 weeks ago

TODO:

Error section should contain line number and error message

Whitespace/comment inputs should be tagged with a boolean