Report heading depth - Githubissues

stsewd / tree-sitter-rst

reStructuredText grammar for tree-sitter

https://stsewd.dev/tree-sitter-rst/

MIT License

49 stars 7 forks source link

Report heading depth #43

Open pfheatwole opened 1 year ago

pfheatwole commented 1 year ago

First up, thanks for the plugin! It's very helpful when combined with aerial.

Would it be possible to report the child/parent relationship between headings? Here's aerial showing an outline for a Markdown document:

markdown

And here's the same outline for a reST document:

restructuredText

stsewd commented 1 year ago

Hi, nested headings would be hard/impossible to put in the grammar itself, since RST doesn't have a specific map of symbols to header level (like = is h1, - is h2, and so on). There are conventions, but they are only that, conventions. RST decides what level is what symbol after parsing the whole document.

But having just one level could be possible, something like (section (header) (content)), but not sure how useful would that be.

pfheatwole commented 1 year ago

I'm not at all familiar with TreeSitter, but VOoM uses a bit of Python to do it in a single pass, accumulating a list of header depths as it encountered them: https://github.com/vim-voom/VOoM/blob/423600d0ab98254bae6fa2ca60d06057ecfb748a/autoload/voom/voom_vimplugin2657/voom_mode_rest.py#L56

I totally understand if the TreeSitter grammar is different enough to make this approach not worth the effort though.

stsewd commented 1 year ago

Hmm, yeah, keeping a list of headers could work (it would be kept together with the indentation level list).

Some ideas for future me:

Max header level would be 6 (as html), all other headers would still be recognized, but as a level 6 header.
The last/first 6 bytes of the scanner can be used to keep track of the header level

pfheatwole commented 1 year ago

It seems to be you've already done the hard part: generating the list of headings.

Sorry I can't be more help with the implementation, but in case it helps "future you" here's a shorter version of the VOoM solution I linked (but using a two-step procedure that determines the section level after finding all the section headings):

Given a list of (line_number, heading_symbol) from the example I showed above:

sections: list[tuple[int, str]] = [
    (1, "#"),  # TOP LEVEL
    (2, "="),  # First
    (3, "-"),  # Dog
    (4, "-"),  # Cat
    (5, "="),  # Second
    (6, "="),  # Third
]

Then section levels can be calculated with:

symbol_levels: dict[str, int] = {}
for (line_number, symbol) in sections:
    if symbol not in symbol_levels:
        symbol_levels[symbol] = len(symbol_levels)
print(symbol_levels)  # Output: {'#': 0, '=': 1, '-': 2}

keewis commented 1 year ago

But having just one level could be possible, something like (section (header) (content)), but not sure how useful would that be.

not sure about syntax highlighting, but when analyzing parsed documents I think this would make retrieving the section content much easier (unless I'm missing something?): right now, this would be done by iterating over the section's siblings until either end-of-document or another section is found