Open progman1 opened 3 years ago
Could you show me the file you're trying to parse?
Or an equivalent file that also breaks like this?
That'd help me see if there's anything that I know is currently unsupported by the Menhir parser or if we need to spend some time digging.
Thanks for opening the issue! 🙌🏼
the link to it is above but here/s an excerpt:
%% %CopyrightEnd%
{const_skip, [wxGenericFindReplaceDialog, wxInvalidDateTime, wxLANGUAGE_KHMER]}.
{not_const,
[wxRETAINED,
%% New enums needed for gl contexts not static numbers
{'wx_GL_COMPAT_PROFILE', {test_if, "wxCHECK_VERSION(3,1,0)"}},
]}.
Oh, sorry, I missed the link.
The parser I think will have trouble parsing that since its built to parse an entire Erlang module. I started the tree-sitter-erlang
project to address some of these limitations, but I haven't yet integrated it into the erlang
library.
You could try using that tree-sitter parser with something like ocaml-tree-sitter
to get up and running. Else I'd be happy to either help you integrate the tree-sitter-erlang
into the erlang
library or rework the Menhir parser as we just landed a new AST here that is waiting to be used.
I don't fully understand! Terms are part of the erlang language aren't they? What's the newest erl-parsetree.ml have on the old? I saw that the parser as-is had just the one entry point (very reasonably :). And I imagined that another entry point into the grammar could be added, one directly to a 'Terms' rule. Which may not be true if 'Term' syntax is not part of the erlang language itself....
You have the incremental parser menhir defnition - how come you're going after tree-sitter?
FYI, on staring at the format of the wxapi.conf for a while I got the impression it may not be a very regular syntax - a sort of lists of lists of lists affair that's ok for erlangs dynamic typing approach. Which suggested to me that I maybe shouldn't start hacking a yacc grammar for it! It also suggests to me that it isn't part of the erlang language as such since you already have a menhir grammar for erlang. I can't remember the limitations of LALR/LR grammars unfortunately.
What's your understanding? thanks.
@progman1 let me try to answer your questions :)
Terms are part of the erlang language aren't they?
Yes, they are.
And I imagined that another entry point into the grammar could be added, one directly to a 'Terms' rule.
We could make a new parser that reuses the expression language from the main parser, yes. This is because Menhir allows only one %start
entrypoint.
how come you're going after tree-sitter?
The Menhir parser is only directly usable within OCaml code, the Tree-sitter parser can be used anywhere with tree-sitter bindings. This is Rust libraries, neovim, github Semantic. The Erlang community benefits more widely from this.
The lowest hanging fruit here would be to refactor erl_parser.mly
into 2 parsers: erl_expr_parser.mly
and erl_mod_parser.mly
. Caramel continues then to rely on the Erlang.Parser.module_from_file/1
and you get a new Erlang.Parser.terms_from_file/1
that you can use to lift your config file into an Erlang.Ast.literal list
.
The strong path forward is to do some work and integrate tree-sitter-erlang
back into this repository, to use that as the term parser first. If that works, it'll be easier to start migrating the main parser to it.
thanks for clarifying. I will tackle the low-hanging fruit! I have done some messing with menhir and something might be doable about entry points via converting to ocamlyacc grammar first, for an even lower hang!
I have a parsed file :) happily, menhir does actually accept more than one start symbol. I had to do dangling commas in tuples and lists - maybe that isn't valid expression language after all? (I don't know if 'term' language is any different to expressions) the file also had multi-line strings which I took to mean should be stuck back together (macro stringification?) so a change there too.
if these are actually valid erlang then I'm happy to send up the patch?
Well I stand corrected! 🙌🏼 I didn't know that, thanks for showing me. Please send a patch 🎉 we can discuss the changes on the PR.
I run Erlang.Parse.from_file on https://github.com/erlang/otp/blob/master/lib/wx/api_gen/wxapi.conf
and get the error
probably because the file defines terms to be read by file:consult/1 and is not appropriate to the front door of your parser. but with a different entry point it could parse terms?