Closed tek closed 3 years ago
try again now
@tek Works perfectly! Wow, C++ is really something.
yeah, I had a blast
question. given a type a (A "a")
, I get the tree
(type_apply (type_name (type_variable)) (type_parens (type_apply (type_name (type)) (type_literal (string)))))))
Is it helpful to have the wrappers type_name
and type_literal
? My reasoning behind it was that both can have multiple variants, like a literal can be Symbol
or Nat
, but a (string)
can also occur in expressions etc.
So is it useful to be able to disambiguate those, or should that be done via the parent node, which can be type_apply
, type_parens
and many more?
I think especially for type_literal
, it makes sense to add some wrapping. For (type_name (type))
, it's less clear to me that the extra wrapping is needed, but I don't fully understand that part of the grammar.
On another note - with rules like these:
_larrow: _ => '<-',
_carrow: _ => '=>',
_lambda: _ => '\\',
☝️ that will actually make the <-
token completely invisible from the tree (as opposed to just showing up as an anonymous node).
The current behavior is that when you give a name to a single token like that, Tree-sitter doesn't create a wrapper node (with the named _larrow
node containing the anonymous "<-"
node). It unifies them, so that _larrow
is the terminal. Since underscore-prefixed nodes are hidden, that will make it impossible to target the <-
node with a query for e.g. syntax highlighting purposes. What we usually do is just directly use "<-"
in the grammar (in place of having any name like _larrow
). I think that'll work better for syntax highlighting, and any use case where you'd want to identify the individual operators.
I think especially for
type_literal
, it makes sense to add some wrapping. For(type_name (type))
, it's less clear to me that the extra wrapping is needed, but I don't fully understand that part of the grammar.
So both type
and type_variable
can occur either in a signature or in their declaration, and I added type_name
to signify that the node is in a signature (it's not even consistent, I think, I would have to improve that). I don't know how feasible it would be to query a node based on the wrapping signature and the type_variable
name, if the node is nested in other type constructs. Or maybe that isn't necessary at all – as I said, I have no practical experience with querying.
point_up that will actually make the
<-
token completely invisible from the tree (as opposed to just showing up as an anonymous nodes).
oh, good to know, thanks!
I'd say leave the type_name
stuff as it is; this PR is already a major improvement, and we can always come back and tweak the structure later.
sounds good!
I do think that before merging, it'd be worth changing those invisible tokens like _larrow
though (to just use the strings directly), but let me know if you feel otherwise.
absolutely
I think the way you broke the grammar into distinct files for each section is actually pretty neat and tidy. I might try that on some other big grammars.
Modulo any suggestions @maxbrunsfeld might have, I think this looks good to me. Thank you very much for all your work on this, @tek, especially the lexer, which is delightfully sophisticated. I propose giving you a commit bit to this repo, unless @maxbrunsfeld objects.
very kind, thank you! I'd be happy to keep maintaining the project.
I'll ping you once I'm done with the finishing touches
Also, just curious - At a high level, why is the external scanner's state a vector<vector<uint16_t>>
, as opposed to a flat vector<uint16_t>
?
@maxbrunsfeld I added that when I was dealing with the preprocessor directives. Since the indentations would have to be reset on an #else
, I just pushed a copy onto the stack on an #if
. But I only realized afterwards that the same would have to be done with the external parser state. Thanks for reminding me, this can now be reverted!
@maxbrunsfeld @patrickt invisible tokens are inlined, double vector is removed, I renamed lots of user-facing nodes and all tests green. if you're satisfied, please merge!
:rocket:
Just FYI, I've been doing squash merges on these grammar repos lately, since they contain generated files, to avoid the repo size growing too fast.
Thanks for the awesome work @tek!
makes sense. it's been a pleasure!
Huge thanks, @tek! This is a real step forward for the Haskell ecosystem at large, since this is (I think) the only working GHC Haskell parser outside of GHC itself!
omg :joy:
@maxbrunsfeld @patrickt Am I supposed to be committing further changes to master?
thanks!
Thank you so much @tek for the amazing rewrite! This is huge for the Haskell community, wonderful work 👏 ❤️
@rewinfrey very kind, thank you! :heart:
@tek Re. master: it’s your call. No one’s consuming this repository as of yet, so I don’t see any huge problem with pushing small fixes directly to master. Bigger features etc. are nice to have as PRs.
@patrickt sure thing, I was mainly asking whether I'm permitted!
Yup! I’ve given you maintainer
privileges, so you should be able to do most things. Give me a shout if you need anything.
will do, thanks!
Hello everyone, I have just found this thread after unsuccessfully trying to use the https://www.npmjs.com/package/tree-sitter-haskell package, which had its latest publish 5 years ago. Would it be possible to publish a new version containing the changes in this PR? Or would this involve additional work that is specific to the npm package?
@maxbrunsfeld you wanna add my account to that package's maintainers?
@tek thanks!
For anyone landing here: There is also a prebuilt wasm file in this repo. It can be re-built via tree-sitter build-wasm .
When I do it on my machine it is 4,5MB, while the prebuilt version is 3MB, so it might not be the newest version?
FYI I built this visualizer: https://felixroos.github.io/haskell-tree-sitter-playground/
very nice!
@felixroos So as per the linked issue above, can we somehow npm install this package these days? Thanks!
hello :wave: I rewrote the grammar and it's working quite nicely. There's still some stuff to do, but at this point I'm opening this PR to get some advice on one specific problem that seems impossible to me.
The issue is preprocessor macros, as in:
Here the block inside of the
#ifdef
ends the current rule, but in the#else
it starts inside of that rule again. I can keep track of the previous state in the scanner, but I don't know how to deal with that for the grammar.Is there some way to reset the parser to a previous state? I looked at the C API but didn't find anything suitable.