Add code folding event.

rayanamal commented 3 months ago

TL;DR: This new event events.FOLDED is required to be able to manually perform code folding on buffer.fold_level when built-in facilities for code folding aren't sufficient. The specific one that led to creation of this is markdown frontmatter, but there may be many more.

Hope you're well Mitchell!

I have studied the 150 line long lexer.fold function over the course of 3 days, to add support for markdown headings which it was utterly incapable of handling. I explored numerous solutions. In the end, I was able modify it minimally and elegantly, without resorting to a workaround like creating an entirely custom function and falling back to the original one outside of markdown. I hope to upstream this implementation soon.

But. Folding the frontmatter separator with the built-in code folding utilities is another beast. I gave up after trying a lot. Now I have a custom function to handle that, which manually manipulates line folding states. It runs after lexer is done folding code, otherwise lexer inevitably interferes and resets fold points and states to what it thinks they should be.

This fork was created a while ago so I merged all the commits you created since then. I hope this change can make it to v12.5.

rayanamal commented 3 months ago

Hey, please don't merge this yet. When I tested there seemed to be an issue: I didn't assign a value to _G.events.FOLDED like _G.events.FOLDED = 'folded'. So TA errors saying it's nil when it should be string. I'll add the assignment just above where I emit the event. Lmk if that's not the proper place to assign to the event.

orbitalquark commented 3 months ago

Kudos to you for spending the time trying to decipher that fold function. It's difficult for me to follow in places...

Before I consider adding this event, I need to understand what the benefit is. It sounds like you are using a stock lexer that has some pre-existing folding behavior, and you want to modify those folds after the fact. I'm wondering if modifying the lexer itself is the better way to go, so folds do not have to be changed after the fact.

Basically, if I add events.FOLDED, I may need to add events.LEXED for symmetry, but one should really just be editing the lexer in question to do the expected work the first time.

Perhaps you can share more information when you are able. There's no hurry though. I don't expect 12.5 to be out until September at the earliest.

rayanamal commented 2 months ago

Well, I tried using the provided lexer.add_fold_point() in a custom lexer. In fact I’m already using a custom lexer to lex the frontmatter among other markdown features I added. But it‘s not possible to use the lexer.add_fold_point() to fold the frontmatter. In fact it’s not possible to use lexer.fold() without modification even for markdown headings, but let’s put aside that one for now. Here is the spec for markdown frontmatter:

Only at the start of a buffer, match three dashes, frontmatter starts.
Lex it with YAML lexer.
Ends when encountered a line with nothing but three dashes.

The challenge is that frontmatter start tag only matches at a certain location (line number 1) unlike all other folded symbols in other languages. This requires getting the current line number through the usual little dance:

local pos = buffer.get_current_pos()
local line = buffer.get_line_from_position(pos)
-- Why is there no function to get it directly?

But of course, buffer and all other APIs are not available in the lexers. You can’t externalize it to an init.lua function either because lexers are fully isolated (nothing wrong with that, I think it’s the correct design). So it’s literally impossible. Furthermore, three dashes also mean a horizontal divider in markdown.

You may think that it can be solved by matching a line that consists of three dashes, with no lines behind it. Which is still impossible, because it’ll work like this:

In our code, we supply lexer.add_fold_point() with the frontmatter opening tag --- and our hypothetical custom frontmatter folding function, which in turn supplied these two to an internal table which lexer.fold() function uses to determine fold points. Our custom function, as per the API docs, will be called for every line in the text to fold. It should return 1 to indicate this line is a starting folding point, -1 for end folding point, and 0 for not a folding point.
When a new buffer is created, lexing starts. Local function highlight() in core/lexer.lua gets the current buffer text and after lexing it, passes that as the text to fold to lexer.fold(). lexer.fold() sees that there is a frontmatter opening tag on the first line, and passes that text, which is the entire buffer, to our hypothetical frontmatter folding function.
Our function successfully detects the frontmatter opening and closing tags, judging by their line numbers in the given text. Opening tag is the one on the first line, closing tag is the next one, the latter ones are horizontal breaks. Our function is technically called several times (for every three dashes detected) but it can correctly judge every time since all buffer text is passed every time.
After lexing of the buffer ends, we are on standby until the user presses a key. When he does, highlight() function does the logical thing: Only passes the 28 lines following the changed line, in order to avoid lexing entire buffer on every keypress.
If the user typed in a valid frontmatter and added an opening tag to the first line at last. First 28 lines is sent to our function. If the content is longer than 28 lines, we can’t detect closing tag. The opening tag will instead be recognized as a horizontal divider which it isn’t. Fail.
If frontmatter is shorter than 28 lines, great. Recognized the opening and closing tags correctly, all is good.
If the user is now editing the frontmatter that is shorter than 28 lines, his caret will be on a line between opening and closing tags. Our function will get the 28 lines after the caret, including the line caret is on. Our function will see the closing tag: three dashes alone on the 7th line. It will think this might be a closing tag, since it’s preceded by some lines. But it may be a horizontal divider too. Our function has no way of knowing because it can’t peek behind the given chunk and find the frontmatter opening tag. If our function (incorrectly) assumes there are no folding points and returns 0 to indicate that, the end folding point which was added when the frontmatter closing tag was recognized when the new buffer created will be removed. Now the entire buffer is under the fold of the frontmatter opening tag at line 1. In fact the GTK version, even with my current implementation involving manually folding using the new events.FOLD event, still exhibits this behavior when I try to auto-fold the frontmatter with a timeout at startup. QT has no such issue, I have no idea why. Fail either way.
After another keypress further in the buffer, a 28-line chunk happens to start with three dashes. Our function thinks this is an opening frontmatter tag, while that line is the 500th line and those dashes are a horizontal divider. Fail.

Just so if it isn't clear, even though I talked here in a theoritical tone this is the observed behavior when I tried to do it.

I believe adding an event to indicate the end of the lexer-based folding to let the user handle such edge cases in user configuration is the correct decision here. These edge cases may be arbitrarily complex, I don't think markdown frontmatter is the only construct not fitting in with the current folding paradigm. Typesetting languages come to mind.

Re my comment above saying “Don’t merge yet”, I now implemented events.FOLD properly and it’s tested and working. Will open a new PR soon.

orbitalquark commented 2 months ago

Thanks for writing. I will read, digest, and respond when I have some time. Thanks for your patience!

rayanamal commented 2 months ago

No problem, no hurries.

orbitalquark commented 3 weeks ago

Sorry for the delayed response, and thanks for your very well thought out, and well-articulated description of events.

I believe this is precisely the reason why Scintilla allows lexers to set line states: https://scintilla.org/ScintillaDoc.html#SCI_SETLINESTATE

I think when your folder identifies and processes a line of interest, it can mark that line with a persistent state (an integer, presumably with bit-flags) so that it knows what to do when encountering it again, or it knows what to do when encountering a subsequent line of interest (e.g. look back up the line states and see what it should do with the current line).

I don't have any working examples of this, but the output.lua lexer marks recognized warning and error lines for process output (e.g. compiler output): https://github.com/orbitalquark/scintillua/blob/f1375d323c19c6264c3ad56abb1c24f52f3a28b9/lexers/output.lua#L27-L41

Textadept (outside the lexer) looks up line states in order to add colored margin markers (e.g. red for errors, yellow for warnings): https://github.com/orbitalquark/textadept/blob/32c1c80678d7c97589e7913aed05f63aef95b3f8/modules/textadept/run.lua#L98-L106

As to your question about why there is no lexer.get_current_line() or similar function, the short answer is that Scintilla doesn't provide one :) (See https://scintilla.org/ScintillaDoc.html#LexerObjects and scroll down to the IDocument interface.)

Prior to Textadept 12, Textadept used a C++ version of its LPeg lexer (Scintillua), which was bound to Scintilla's IDocument interface. Scintillua supports both versions, so its Lua State doesn't provide any extra functions (there hasn't really been a need for this, honestly).

I'm not saying all of this to shut down the idea of adding another event. I'm merely interested in exploring Scintilla's existing facilities for handling your use case.

orbitalquark / textadept

Add code folding event. #547