Closed Turbo87 closed 5 years ago
Yeah, sorry for the lack of documentation around that. As you probably could tell, scanner.cc
is a hand-written source file, unlike parser.c
which is generated based on the grammar. It's called an "external scanner", and it's used in Tree-sitter parsers where you need a little bit of logic that can't be expressed in the context-free grammar + regular expression format.
In the case of HTML, we use it to implement tag-name matching, as well as HTML's idiosyncratic logic for which tags can be self-closing, etc.
For handlebars, I don't think you'd need to modify tree-sitter-html
in any way. I think you'd want to create a new parser, somewhat like tree-sitter-embedded-template
(which parses the templating language used by EJS and ERB: <%
and %>
tags, etc). The new parser (let's call it tree-sitter-handlebars
) would just be responsible for parsing handlebars tags, not the underlying HTML.
Then, to parse a handlebars template, you would first parse the file with tree-sitter-handlebars
. Then, you would take that syntax tree and find the ranges of all of the content
nodes (nodes that represent chunks of text content between the handlebars tags), and parse those ranges using tree-sitter-html
.
Tree sitter's includedRanges
API allows you to parse a set of disjoint ranges in a document. That's how we parse things like EJS and ERB in Atom today. Does that make sense?
Then, to parse a handlebars template, you would first parse the file with
tree-sitter-handlebars
. Then, you would take that syntax tree and find the ranges of all of thecontent
nodes (nodes that represent chunks of text content between the handlebars tags), and parse those ranges usingtree-sitter-html
.Does that make sense?
I understand your proposal, I'm just not sure yet if it's the best way forward. Handlebars is commonly used in two way:
for 1. a simple tree-sitter-handlebars
plugin would probably be sufficient, but for 2. it would be preferable to have a unified AST in the end that supports both HTML, and the Handlebars subset used by Ember.js (no partials, etc).
just to give you an idea of typical Ember.js template code:
<div class="is-car {{if isFast "zoooom" "putt-putt-putt"}}">
{{car-component car=model}}
</div>
as you can see it's possible to use Handlebars bindings inside of HTML element attributes, and if used like class={{someBinding}}
I would assume the HTML parser would return an error because of the missing attribute value?
Missing attribute values are ok, so we wouldn’t get an error there.
I see your point about modeling embers handlebars implementation more exactly. It’s definitely doable, but would require duplicating most of the code in this repo. And it still seems like you need a different approach for the more general usage of handlebars as a template language.
Duplicating some of this logic is not a huge deal, but it might be worth trying the simpler approach first, and seeing if you really need to model it all as one language.
External scanners have been documented: http://tree-sitter.github.io/tree-sitter/creating-parsers#external-scanners.
After reading the official tree-sitter docs I'm now trying to understand the implementation in this project. Unfortunately the custom scanner doesn't seem to be documented and I'm wondering what it's purpose is, why it's needed and if it would need to be extended to support something like Handlebars.