Work out what to do about syntax highlighting

sminez commented 2 months ago

Please describe the change / addition you'd like to see made

Natively supporting tree sitter for driving syntax highlighting might be worthwhile looking into but it will need a bit of investigation into how that interacts with the rest of the functionality within ad.

https://github.com/tree-sitter/tree-sitter/tree/master/lib/binding_rust

Other options for syntax highlighting are of course available (syntect being a big one) but they all have the same issue of how they fit in with the ad model of "everything is just text" and all text being treated equivalently within the editor. Littering the code with special case handling of buffer content depending on the filetype is something I really want to avoid. In the case of LSP things are a little different as its about determining things based on the file and then running external tooling, rather than modifying how the contents of the buffer itself are interpreted.

Is this a feature you have seen in other text editors?

SomeGuyNamedMay commented 1 month ago

It might be worth looking into integrating tree sitter nodes into the command language, maybe taking insperation from something like https://github.com/ast-grep/ast-grep if were planning on implementing tree sitter support directly into the program

sminez commented 1 month ago

Possibly, but how it composes with the rest of the functionality in ad is what I'm not sure about. Tree sitter is incredibly powerful but it really wants to be the source of truth for a lot of things which doesn't sit well within the existing feature set of ad. A core aspect of ad is that everything is just text within a buffer, but with treesitter everything is an AST node. That quite fundamentally changes the semantics of how everything works.

rakoo commented 1 month ago

Just for reference, and I know I've already said it I'm sorry, but there's the ki editor (https://ki-editor.github.io/ki-editor/) that natively integrates tree-sitter. It uses it to natively define movements in a new way.

I think, to understand whether it's worth it, maybe knowing what functionality it enables might be interesting ? And if these functionalities can be done with scripting, that might be a good exercise of the interface

sminez commented 1 month ago

Like is said in the other issue where ki came up, it's a very interesting idea and worth looking into it you find it interesting, but the goal of ad is not to add every feature available in other editors. It is deliberately opinionated and minimal which is going to mean that there are some things that it just doesn't support I'm afraid 🙂

For clarity here, the reason I opened this issue originally was to look into using treesitter for syntax highlighting, though I realise that is not made clear at all in the issue itself. Using treesitter to manipulate the buffer state is not the goal here.

lobre commented 6 days ago

Hey just in case you haven't seen it, vis (https://github.com/martanne/vis) uses LPeg and grammars from https://github.com/orbitalquark/scintillua.

Don't know if that would work with your design but at least it seems to suit vis well (simple enough).

sminez commented 5 days ago

@lobre thanks 🙂

My concern is mostly around wanting to keep things minimal and applicable to all ad buffers. Having to maintain multiple ASTs for different purposes and keeping them all in sync is something I want to avoid if at all possible. A big part of why I started writing ad is seeing how much code gets executed on every key press inside of most modern text editors and IDEs 😅

That's not to say that I don't want to add more features to ad, but I want to make sure things are kept as minimal and composible as possible. I'm preferring to move relatively slowly with adding new built-in functionality rather than throwing everything in at once and then figuring out how to make it work.

lobre commented 5 days ago

I was proposing LPeg because i thought the grammars were "simpler" and well suited for syntax highlighting.

I am not sure what kind of solution you are looking for if you don't want to have to depend on "syntax rules", "syntax trees" or "syntax regex". To me, you will always have maintenance, except if you find some kind of simple universal rules that could be applied to all languages.

But if you are asking me, the simplest is no syntax highlighting 😅. I personally got rid of them a few years ago in Kakoune, and I am now a happy man. I don't miss it at all!

bbarker commented 5 days ago

I have to admit, I personally enjoy a lot of what acme has to offer, but I also like syntax highlighting and LSPs. Rather than building in, maybe leaving it to user discretion would be appropriate (i.e., building extents similar to as is done in penrose)?

sminez commented 5 days ago

@lobre its not that I don't want to have to depend on syntax rules / tree / regex: that's a given if you want to actually parse and highlight the contents of a given buffer. Its more that if I'm going to do that then I'd like that parse tree to be the only one required for a given buffer (re-using it where needed) rather than only being for syntax highlighting. I'm kind of with you on just getting rid of highlighting all together :sweat_smile: Whenever I try it I'm pleasantly surprised with how much I like it, though I think I would like to still have comments in a muted colour (and maybe strings highlighted as well given that you need to tokenize them anyway to correctly identify comments). That said, I can also see that there are a bunch of useful things that you can do with an AST of the buffer content as has been pointed out about the ki editor above. My concern there is that the AST is then a core part of how the editor thinks about and manages a buffer, which isn't quite what I'm after. I want to keep the core of ad as simple as:

ad contains buffers
buffers are utf8 encoded text
you can select, read, edit, load and execute sub-regions of a buffer

With that as the base layer API you can then layer on more complicated things like AST parsing, highlighting, querying etc, but I don't want the editor itself to bake those concepts in as guarantees about every buffer. At the end of the day its all just bytes anyway, so its about where you draw an artificial line and make guarantees about certain semantics.

@bbarker there's some chat about LSP support in #46 and its actually something I've been looking into over the last couple of days :slightly_smiling_face: I'm quite keen on the idea of the editor being extensible in a way similar to penrose though I'm not sure the model of providing it as a library is quite the right fit this time? For me personally at least I feel like its far more likely that you want to make changes to your editor setup than your window manager. I've been toying with the idea of how you might register extensions to the virtual filesystem (adding ad/buffers/$bufid/lsp files from an external process for example) but I'll need to have more of a think about how to do that in a nice way. That would make writing certain kinds of extensions a lot nicer I suspect. In the case of the AST stuff we're talking about here you could expose the full tree and the results of queries for example (taking tree-sitter as an example):

$ 9p read ad/buffers/1/body
pub fn foo() {
  1
}

$ 9p read ad/buffers/1/tree-sitter/ast
(source_file
  (function_item
    (visibility_modifier)
    (identifier)
    (parameters)
    (block
      (number_literal)
    )
  )
)

Just a thought :wink:

sminez commented 3 days ago

https://github.com/sminez/ad/tree/minimal-syntax is some in progress work on supporting only highlighting strings and comments

lobre commented 3 days ago

Just parsing a few things such as comments and strings is a good idea. Though, it is not 100% accurate as the delimiters are different across languages. Most languages use // and /*, but bash for example uses # and vim uses ". If this is not added as a configurable option, it will only work for "most cases" (which might be fine if ad wants to stay simple).

This principle of extension of the 9p filesystem is interesting! It keeps the core simple.

However, there is one concern that I have with 9p in general. It might be better to discuss that at another place, but just to briefly touch it, I wonder how concurrent extensions/plugins can heavily rely on the filesystem while it is an asynchronous mechanism.

This concern was raised on a draft pull request that was created back in the day for Kakoune (and that never got merged). But discussions about why it was abandoned started again a few weeks ago. The project author raised this:

I do not remember where the discussion happened, but the issue was that 9p is fundamentally asynchronous, so any information you get from the filesystem might be out-of-date when you try to act on it.

https://github.com/mawww/kakoune/pull/3116#issuecomment-2448891637

If you want more context, I suggest that your read the rest of the discussion there.

I would be interested in having your opinion about that, as ad fully relies on 9p. Do you consider this to be a manageable issue?

sminez commented 3 days ago

My current idea for highlighting strings and comments would be that it would be configurable by file extension (if the branch linked above ends up being merged) :slightly_smiling_face:

Thanks for linking to the 9p discussion in kakoune, its an interesting read! I think the issue around 9p being fundamentally asynchronous as an API is valid one when you want to integrate it with a pre-existing system where there are semantics that conflict with that approach. ad uses the same events file approach as acme which effectively acts as a lock around a given buffer allowing an external process to own the event stream. Internally ad is a single-threaded event loop and any external process that has that events file open is treated as a filter for inputs to the buffer.

I don't know enough about the internals and full functionality of kakoune to say anything definitive, but I suspect the extensibility of ad/acme is strictly less rich than what you can do in kakoune?

Meanwhile, in Kakoune land, there can be a bazillion hooks at global, buffer, and window scopes, they can all launch processes in parallel that do computations and try to manipulate the editor's state. Kakoune is just a whole lot more interactive than Acme.

I think this is why things work in acme but not in a way that would play nicely with the way kakoune works. It doesn't necessarily mean that the approach doesn't work at all, but it does place limitations on how you can implement things. For ad the plan as I've said previously is to be really quite minimal, so I'm hoping this should all work in a similar way to acme. But that is very much a hope / hunch rather than strong technical opinion.

sminez / ad

Work out what to do about syntax highlighting #19

Please describe the change / addition you'd like to see made

Is this a feature you have seen in other text editors?