pulsar-edit / pulsar

A Community-led Hyper-Hackable Text Editor
https://pulsar-edit.dev
Other
3.2k stars 133 forks source link

[Tree-sitter] Create SQL grammar #1061

Open savetheclocktower opened 1 month ago

savetheclocktower commented 1 month ago

Have you checked for existing feature requests?

Summary

This recent comment reminds me of the painful constraint that injections don't work across different types of grammar.

We don't experience this much because so many of the built-in packages now have modern Tree-sitter grammars. But there are a few notable ones that don't:

As well as some oddballs:

Why do we have these gaps? In short: the most reliable Tree-sitter grammars out there (especially 18 months ago when I began this insane project) were the first-party grammars. If the tree-sitter GitHub organization provided a parser that matched one of our built-in languages, I used it. For more obscure languages, there weren't many obvious candidates for up-to-date, actively-maintained parsers from third parties.

I feel no urgency on the oddballs and not much urgency on the more notable ones. But with Tree-sitter adoption increasing over time (thanks, Neovim!) it's worth checking every few months to see if we can support more languages.

SQL is the best example; it looks like there are a few tree-sitter-sql repositories out there. We should investigate them and see about integrating one.

What benefits does this feature provide?

The main benefit (apart from theoretically improved syntax highlighting, code folding, etc.) is the ability to inject into other grammars. The commenter described above is justifiably annoyed that they can't get their SQL highlighted when they write Markdown and include a SQL code block.

Any alternatives?

The only real alternative is the theoretical ability to be able to inject TextMate grammars into Tree-sitter grammars (and maybe vice-versa). That's something I floated early on, but it'd be so hard that I don't think it's worth the effort. Better to put that work into modernizing more languages.

Other examples:

No response

savetheclocktower commented 1 month ago

There are four SQL parsers linked from the Tree-sitter homepage:

I spent about 45 minutes this morning testing the SQL (General) parser with some SQL dumps I had lying around (mainly from a backup of a WordPress site), some code snippets for various SQL features, and so on.

Some observations about this parser:

For instance:

I think it’s time to put this one aside. I’ll try the SQL (PostgreSQL) grammar next.

But if that one doesn’t fare any better, I wonder if there’s a better approach: a Tree-sitter grammar that defaults to total permissiveness for SQL:

This sort of parser would function much more like a TextMate grammar: decreasing its ambitions and gaining some robustness in return. The only problem is that I'd have to write it.

confused-Techie commented 1 month ago

@savetheclocktower Appreciate the write up you've done here.

Although I'll assume that taking on writing a SQL Tree-sitter parser is a bit more to your workload considering what you're already propping up, if you'd like I'd be more than happy to take on a parser for this purpose, as Tree-sitter parsing is where I got most in depth when you first started this, even if my acumen fell off somewhat getting this properly into Pulsar.

And considering the frequency in which I'm working with PostgreSQL for the backend, I'd be motivated to keep things in working order.

If you haven't already started something I'll see if there's anything I can do in this area, in respect to a parser, then ideally that just means I'd have some reliance on getting it integrated into Pulsar.

Obviously if you've already started I don't mean to step on any toes, just shouting out where I can try to fill in to assist