Closed nikomatsakis closed 5 years ago
@nagisa , as it was your RFC, do you have opinions on the strategy we take here? Or should I be thinking about this.
EDIT: @nagisa mentioned on IRC that they are too busy, so I'll come up with aplan.
I’m curious what kind of testing can check that a grammar matches the not-grammar-based parser.
Well, it can show the presence of bugs, but not the absence of them.
https://github.com/rust-lang/rust/blob/master/mk/grammar.mk is what we used to do. Just vestigial at this point :)
I’m curious what kind of testing can check that a grammar matches the not-grammar-based parser.
In intellij-rust, we have regression tests for the parser, which consist of a Rust file and a serialized AST. Then we check that the parser produces the expected AST. We also have some negative "you shall not parse" tests. The same technique can be applied to the reference grammar, if we ensure that the serialized AST is sufficiently abstract and doesn't leak grammar "implementation details".
The question here is what tool should be used for the reference grammar? I think it has to goals:
One option is to continue to use Bison. In my opinion, it doesn't fulfill the second goal: grammar is difficult to read because of large amount of duplication and low level details. Regular expression extensions would be really nice to have in the canonical grammar.
Another extreme option is to build a custom parser generator (LALRPOP ?) :) It would have a nice side effect of making Rust more suitable for efficiently implementing programming languages: a realm currently occupied by C/C++.
I have opinions but not much time. I've been interesting in porting the Rust grammar to LALRPOP -- I think it'd be much cleaner, as LALRPOP has a number of features that should allow us to avoid some of the abuse of precedence declarations and the like. (I found the existing grammar not very useful in evaluating syntactic changes because of those precedence declarations, for example.)
(As an aside, this would probably be a good stress test for LALRPOP, I assume we'd have to fix various bugs or add new features to scale it up that large.)
So this morning I started porting the yacc grammar in the repository to LALRPOP, just to see what would happen. I made some progress. But I was wondering: is that grammar the most up to date version, or is the one in the repo more up to date? Does anyone know what the differences are between them?
I was thinking about working on that as well, but I unfortunately don't have too much time in the next month to work on this. What do you think about creating a repo to collaborate? (I could take care of that as well) On Jan 28, 2016 18:51, "Niko Matsakis" notifications@github.com wrote:
So this morning I started porting the yacc grammar in the repository to LALRPOP, just to see what would happen. I made some progress. But I was wondering: is that grammar the most up to date version, or is the one in the repo more up to date? Does anyone know what the differences are between them?
— Reply to this email directly or view it on GitHub https://github.com/rust-lang/rust/issues/30942#issuecomment-176303713.
The (parser-lalr) grammar originally comes from https://github.com/bleibig/rust-grammar, which hasn’t been updated in a while.
@fhahn
I was thinking about working on that as well, but I unfortunately don't have too much time in the next month to work on this. What do you think about creating a repo to collaborate? (I could take care of that as well)
I will do this. I am right now just working through some LALRPOP issues that arose (I'm not sure if it's a bug or if LALRPOP just needs optimization; I suspect both :) but once I get that sorted out a bit I will open up a repo and post the link here. Hopefully tomorrow. It seems @TyOverby may be interested as well, and no doubt others. For example, I was talking to @jorendorff, who developed https://github.com/jorendorff/rust-grammar/, as well.
OK, I put up a definite work in progress here: https://github.com/nikomatsakis/rustypop
I also described some conventions that I am aiming towards, but have not yet achieved.
This is hard work to parallelize in the early stages, but if you're interested in hacking on it, let me know over IRC or what have you (or just open some PRs). I'll probably alternate between working on it and improving LALRPOP (for example, I am now highly motivated to get a better printout on shift-reduce errors). =)
If we can generate a parser from the official grammar, and if we can run rustc
in parse-only mode, and if we can make the two of them produce output and errors in identical format, then I wonder if we could use AFL to automatically generate test cases for them. Of course even this could not prove that they match, but it might give a higher level of assurance than a relatively smaller number of ad hoc human-devised test cases.
@glaebhoerl: Well, there's also that the entire formalism the official grammar is based on on is called "generative grammars" - using them to create working parsers came after using them to create exemplars, which is essentially a matter of transforming the grammar into a tree and executing a depth-first search.
For example:
start := possibility start?
possibility := light | neutral | dark
light := "Solitari" | "Gandalf"
neutral := "Lunitari" | "Switzerland"
dark := "Nuitari" | "Chocolate"
We form the following tree:
With that, we then walk the tree, generating each possible string covered by the grammar.
One can do this efficiently, without actually generating the tree, by assigning each optional subrule a bit, treating the collection of those bits as a number, and counting.
Enabling a subrule will sometimes reveal another optional; in that case, push the current "number" onto a queue, and when you've finished counting the "basic optionals", pop items off the queue and count the newly-revealed optionals again, adding them to the queue if they reveal more.
That ensures one will generate every rule, in their shortest exemplars, before continuing on to the next shortest, etc.
OK, so there's probably something horribly wrong with it, but the rustypop crate now builds without any shift/reduce conflicts. I haven't actually tried RUNNING the code it produces, of course, and I have all empty actions, so it will only yield a "true/false" result. Plus I need to adapt the rustc tokenizer. But it seems like progress. :)
Enough progress that it may be possible to start parallelizing the work (debugging shift/reduce conflicts is kind of a serial task...).
Of course, I also expect that as soon as we try using this grammar we'll find that I did some bone-headed things that resolved all conflicts by just not parsing anything at all, or something like that.
Anyway, I plan to write up a blog post about the approach I took, since it made use of a number of LALRPOP features to try and pare down duplication. I also took the liberty of ripping out various bits of obsolete syntax from the existing rust-parser.y
, and made a few arbitrary judgement calls about dubious bits of our grammar. ;)
Oh, I should mention that it generates a 500MB .rs
file. Definitely need to do some work reducing the size of LALRPOP's output (@fhahn has actually been pursuing the most immediately obvious strategy, and I've been meaning to implement various other cool optimizations that I've read about...)
Hmm, now I see some more conflicts. So maybe premature. But still, getting close I think. :)
Another tool which can be used for the canonical grammar is antlr4. I have not used it myself, but I think it should be mentioned in this thread.
I believe we used to use antlr4 before the yacc-based grammar got merged.
Looks like only the lexer was implemented in antlr4: https://github.com/rust-lang/rust/tree/29bd9a06efd2f8c8a7b1102e2203cc0e6ae2dcba/src/grammar
I used antlr4 to make https://github.com/jorendorff/rust-grammar and it has a few drawbacks:
assign_expr
and assign_expr_no_struct
(the production for if-statements is IF expr_no_struct block
to force parentheses around extremely strange conditions like if Range { start: a, stop: b }.count() > 0 { do_something() }
). antlr4's support for arguments to productions is unusable, which is why my grammar has a second complete copy of the expression grammar tagged with _no_struct
.@jorendorff I suggest using a macro preprocessor or build step to reduce some of the code duplication.
cc https://github.com/rust-lang/rust/issues/15880 (we'll want a bot for this)
Hey, and what about lexer / parser split?
Perhaps we should create a canonical lexical structure grammar before jumping onto the grammar for the whole language?
cc @dns2utf8
@matklad I’d consider formalising lexer an inherent part of grammar formalisation.
That being said, unlike the grammar, which has been extended significantly over time, lexer has stayed considerably constant (i.e. is a different problem space) and the reference is still pretty good at capturing the lexical structure of the language.
I’d consider formalising lexer an inherent part of grammar formalisation.
The point is that lexer can be formalized before the rest of the grammar, so it is a good independent first step. Having an executable semi declarative specification of the lexer would help for technical reasons:
is still pretty good at capturing the lexical structure of the language.
It can be better though! There are some corner cases like 16 >> 2
vs collect::<Vec<_>>
, raw string literals, self::foo::bar
vs self::foo ::bar
vs self ::foo::bar
. Also, the reference is not executable, so it must have some bugs (like this one)
I would like to create a complete grammar first.
If the src/grammar/*.g4
files covered everything we were able to run a generated lexer against the tests and spot cases where one of them accepts where the other does not and find differences.
I am currently at RustFest.eu if somebody is here too and would like to talk a little about it.
@matklad: There are a couple of factors that make doing a lexer separately from the grammar less meaningful.
Point (1) is mostly a question of usefulness, and I admit it may not be compelling. Point (2), however, is a problem.
Contextual keywords aren't (necessarily) a problem for the parser; carefully-chosen, I suspect they may be able to dodge making the grammar context-sensitive rather than context-free (though I don't have any proof of it; I just don't want to feel discouraged that Rust's grammar became a CSG).
However they do conflict with the entire concept of a lexer, which in a deterministic and entirely local manner turns text into tokens. Antlr4 handles this problem with a concept of "lexical modes" which the grammar can drive switching of, but a standalone lexer couldn't use that.
@eternaleye The contextual keyword problem can be somewhat avoided by having the grammar use identifier tokens with specific values, instead of separate keyword tokens. That is how the implementation actually works, as keywords are merely pre-interned identifiers.
@eddyb: Mm, that makes sense. Sadly, it also makes the lexer less clear/useful independent of the grammar :/
@eternaleye It's what macros interact with, so it's still useful for describing those.
@eternaleye
I don't see why. You can equivalently have the lexer have a special token for each keyword, and add a parser production such as
identifier = IDENTIFIER | UNION
This may require some odd finite-ambiguity-resolution hacks in the grammar, but it's not like we don't already have them.
Hey, I've created a pre pre canonical lexer rfc: https://internals.rust-lang.org/t/pre-pre-rfc-canonical-lexer-specification/4099
On Sat, Sep 17, 2016 at 09:19:21AM -0700, eternaleye wrote:
@matklad: There are a couple of factors that make doing a lexer separately from the grammar less meaningful.
- Lexers aren't an inherent part of grammar formalisms; though most tooling does use them. There are also various advantages to scannerless parsing.
- The whole lexer/parser thing works really well for languages that have keywords. It works considerably less well for languages with contextual keywords. Rust, since not too long ago, does.
Point (1) is mostly a question of usefulness, and I admit it may not be compelling. Point (2), however, is a problem.
Contextual keywords aren't (necessarily) a problem for the parser; carefully-chosen, I suspect they may be able to dodge making the grammar context-sensitive rather than context-free (though I don't have any proof of it; I just don't want to feel discouraged that Rust's grammar became a CSG).
However they do conflict with the entire concept of a lexer, which in a deterministic and entirely local manner turns text into tokens. Antlr4 handles this problem with a concept of "lexer modes" which the grammar can drive switching of, but a standalone lexer couldn't use that.
You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/rust-lang/rust/issues/30942#issuecomment-247785740
One solution is to always emit special tokens for contextual keywords, and have grammar rules that allow them to be used in identifier context in certain places.
If anyone is interested, I'd like to push more on the rustypop grammar. It handles a number of issues (e.g., the <
vs <<
lookahead problem; contextual keywords; subsets of expressions) using pretty clean techniques (i.e., no precedence hacks or stuff that steps outside standard theory, mostly just macros and careful definition of the set of tokens). I've not had time to actually make it executable and start fixing the various bugs that i'm sure are to be found, however. If that interests you, please contact me on IRC or e-mail!
One quick thing that I'd like to flag up as it occurred to me in the context of working on https://github.com/rust-lang-nursery/reference/issues/9 (aka "Document all the things!"): when this eventually becomes true, we should link it in the reference and other docs as appropriate.
I've been developing a Haskell library for parsing and pretty printing Rust source. In the process, I wrote an LALR grammar for Rust. The grammar is meant to be consumed by Haskell's Happy parser generator. AFAIK it is pretty complete - I think it handles pretty much everything on stable and beta (and a lot of stuff on nightly too)
It is based on
rustc
parser source codeNote I'm still in the process of tweaking and testing it. Right now, I've only got a couple hundred manually written test cases.
If this is in any way useful or there is anything I could apply to help with this issue, I'd be glad to - I've been looking for a way to get involved with rustc
.
@harpocrates cool! I've been making slow progress on rustypop as well, though I've not yet taken the time to implement the testing harness. It'd be good to compare notes.
@nikomatsakis Sure thing. I'm currently writing what I think is my last set of tests for the parser - basically diff-ing out the output of my parser with the JSON output of rustc -Z ast-json-noexpand
. Once I get that done, I'll try my parser on as much Rust code as I can get my hands on. Once that is complete, I'll report back here!
Hello
Each open source jungle is difficult in the beginning. Therefore, the following posts may be placed in the wrong place. Someone can certainly correct this to the right forum.
Well well
I lack a grammar for Rust. For users. There are fragments in some documents but no cohesive. The grammar does not have to be the basis for compilation but should help to understand the structure of the language RUST.
I am 70 years old and have used RUST for a few months and have done a tool
https://github.com/willy610/bnf2railroad
who reads a grammar in EBNF style and produces so-called railroad views. The views are raw TTY, html with anchors and svg.
I have always thought that these railroad views have been a good complement to other documentation such as examples and small projects. My long experience of C, Smalltalk, SQL, Mathematica, JS and Java has of course helped me to get RUST relatively quickly. RUST is a very ambitious and radical approach to programming.
Upon learning, I have interpreted the grammar files contained in RUST documentation. They are a bit incomplete and do not have the quality that other languages offer. Much is undefined.
Together with the tool there are grammar for Pascal, Smalltalk, Lua, Json, rege and fragments of RUST.
The working method could be:
I have also used the tool to generate better help information for using the tool (parameters of the program)
Perhaps the tool could be useful internally for developers of RUST for, for example, specification, bug report etc
@harpocrates I'm curious, what is the status of your rust parser? @matklad and I have been talking about trying to get a better effort going here. The rough plan is to start with your grammar (probably converted to LALRPOP and then perhaps clean it up). Also, would you be interested in being involved in any such effort?
@nikomatsakis my rust parser is complete. I would be interested in helping out with any effort to convert to LALRPOP. What would be the best way to proceed?
Note that converting to LALRPOP may not be completely trivial. My grammar is currently written for Happy (a Haskell parser generator) and makes use of a handful of Happy features, namely:
>>
token in something like Vec<Vec<i32>>
, gets split into >
and >
)@harpocrates looking at your parser, I was curious, how have you tested it? I see a few tests in the repository, but I was wondering if you tested against e.g. the sources of the Rust compiler, or crates.io, or something like that.
@harpocrates
Note that converting to LALRPOP may not be completely trivial. My grammar is currently written for Happy (a Haskell parser generator) and makes use of a handful of Happy features, namely:
LALRPOP supports parameterized productions. It does not support pushing back tokens, but there is an alternative way to handle >>
in any case. (You distinguish ">
followed by >
" as one token and ">
followed by something else" as another kind of token.)
The trickier thing will be precedence, since LALRPOP does not support those sorts of declarations. We ought to be able to refactor the grammar though (I actually converted the existing .y
grammar once already, in rustypop, but I figured I would rather start with your grammar since it seems far more complete and better tested.)
The first thing I had planned to do, in any case, is to write a tool to convert your grammar into LALRPOP syntax, and see how LALRPOP feels about it. The next priority would be a testing harness. It's rather late here so I'll have to write more later but I'd love to collaborate, particularly since my personal time is quite limited.
Separately, @matklad and I have been talking about extending LALRPOP with a way to generate default actions that will fire off events suitable for building a generic parse tree, roughly as described in their RFC.
Hej Niko The status of my product is stable and no bigger issues more than proper UNICOE/UTF processing remains. Today only the ASCII subset is supported.
I really will - and can!? contribute - as I got a lot of time. I’m am retired but works at the computer at least 8 hour a day.
As of writing I have a very tiny end expensive connection to the internet; just via phone as the wlan supplier in this big city of Gothenburg have trouble with just my connection for more than a week. Crazy.
So please pass the git(s) to the most covering project on LALRPOP so I don’t have the surf around too much.
And also I would very much like some kind of use case or scenarios how the function/tool should be used
Looking forward to the work and the result!
Kindly, Willy
On 2018-03-29, at 18:27, Niko Matsakis notifications@github.com wrote:
@harpocrates I'm curious, what is the status of your rust parser? @matklad and I have been talking about trying to get a better effort going here. The rough plan is to start with your grammar (probably converted to LALRPOP and then perhaps clean it up). Also, would you be interested in being involved in any such effort?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
Willy Svenningsson Johannas Vag 28 S-425 42 Hisings Karra Sweden +46 0768 22 20 26 willy@fager.st
Hej I copy and pasted grammar rules from https://doc.rust-lang.org/grammar.html for RUST I did the same for Lua, for Smalltalk I read from the 'Smalltalk-80 The language’ and for Pascal of read the Nicklaus Wirth ’Algorithms + Data Structures = Program’
I have not locked into the RUST compiler source or any projects in crates.io. My vision! with my work was to support writers of language documentation giving them tools to work with - reference - subsets of proper/versioned grammar rules as inclusions of both BNF rules and RailRoad graphs.
I assume that there are no explicit language rules in the compilers sources but the rules appear in some pre-step for generating part of the complier. That why I didn’t digger into the compiler source. So from some versioned preprocess rule definitions there might a possibility to extract rule snippets to be converted to BNF and RailRoad.
But I will look into both the source of RUST compiler and some crates.io and the LALRPOP too.
Kindly, Willy
On 2018-03-29, at 23:25, Niko Matsakis notifications@github.com wrote:
@harpocrates looking at your parser, I was curious, how have you tested it? I see a few tests in the repository, but I was wondering if you tested against e.g. the sources of the Rust compiler, or crates.io, or something like that.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
Willy Svenningsson Johannas Vag 28 S-425 42 Hisings Karra Sweden +46 0768 22 20 26 willy@fager.st
You may also be interested in following rust-lang-nursery/reference#221. The people working on the reference have been making great progress, and the up-to-date grammar (at https://brauliobz.github.io/rust-reference/grammar.html) appears to be getting closer to complete.
@willy610
The status of my product is stable and no bigger issues more than proper UNICOE/UTF processing remains.
Sorry for disappearing! After writing those messages, I got totally overwhelmed (and this week i'm actually on vacation, so I'll probably be slow to reply again.) I'm a bit confused about which project you are referring to -- do you mean the railroad diagrams you referenced here? If so, that seems like a cool visualization technique, but presumably it requires an EBNF grammar to start? At the moment, that last part is what I am most interesting in obtaining; note though that if we had a working LALRPOP grammar, it "desugars down" to a plain CFG internally, so we ought to be able to use your tool to visualize it.
@ehuss
The people working on the reference have been making great progress, and the up-to-date grammar...appears to be getting closer to complete
That's great! Is it presently being tested? And, if so, how?
Hej Niko
The project I’m referencing is my project https://github.com/rust-lang/rust/issues/30942#issuecomment-359103426
My focus is to keep the possibility to generate railroad view from grammars of kind EBNF. And have the verification function there (Unused, missing rules etc)
At the moment I’m investing some RUST grammar approaches
https://github.com/rust-lang/rust/issues/30942#issuecomment-359103426
https://github.com/rust-lang-nursery/reference/tree/master/src
https://github.com/jorendorff/rust-grammar/blob/master/Rust.g4
https://github.com/nikomatsakis/rustypop/blob/master/src/rusty.lalrpop
https://github.com/lalrpop/lalrpop
A. In most sources I struggled with character set descriptions and all the regeexpr. So I introduced, in my EBNF, 'Character Set Expressions' with set, union, difference and range. It will show up soon.
B. I think Mark Down ’marking’ is insufficient for a BNF. Perhaps generating plain HTML snippets from rules. They can be styled looking like program source. Color, font etc
C. EBNF decorated with types is a must. I’m not there yet
D. Actions in the rules like {} in yacc and => in LARPOP could
perhaps be specified in the EBNF as a (parametric) reference to other source
like
pub Expr: Box
Clean up grammar from actions and relate them using ’@' instead. The intersection of EBNF and LALRPOP could be ...
E. Or perhaps the best. Have a tool exporting from LALRPOP as EBNF (with type) And the continue with generation railroad and markdown files from that source.
I’m really sorry for working mostly off road.
I will probably not update the above bnf2railroad but add a heavy refactored one as an other project
On 2018-04-17, at 12:35, Niko Matsakis notifications@github.com wrote:
@willy610
The status of my product is stable and no bigger issues more than proper UNICOE/UTF processing remains.
Sorry for disappearing! After writing those messages, I got totally overwhelmed (and this week i'm actually on vacation, so I'll probably be slow to reply again.) I'm a bit confused about which project you are referring to -- do you mean the railroad diagrams you referenced here? If so, that seems like a cool visualization technique, but presumably it requires an EBNF grammar to start? At the moment, that last part is what I am most interesting in obtaining; note though that if we had a working LALRPOP grammar, it "desugars down" to a plain CFG internally, so we ought to be able to use your tool to visualize it.
@ehuss
The people working on the reference have been making great progress, and the up-to-date grammar...appears to be getting closer to complete
That's great! Is it presently being tested? And, if so, how?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
Willy Svenningsson Johannas Vag 28 S-425 42 Hisings Karra Sweden +46 0768 22 20 26 willy@fager.st
The grammar that's within reference (the markdown files) is way overdue for an update and you are not the first to notice it. In fact -- this RFC happened specifically because of it.
Replacing the grammar definitions in the reference markdown files with something else (such as railroads) would be super awesome, but before we can do that we need a complete grammar definition anyway.
On Thu, Apr 19, 2018, 00:10 Willy Svenningsson notifications@github.com wrote:
Hej Niko
The project I’m referencing is my project https://github.com/rust-lang/rust/issues/30942#issuecomment-359103426
My focus is to keep the possibility to generate railroad view from grammars of kind EBNF. And have the verification function there (Unused, missing rules etc)
At the moment I’m investing some RUST grammar approaches
- The old yacc from RUST source
https://github.com/rust-lang/rust/issues/30942#issuecomment-359103426
- From some md files in the Rust nursery
https://github.com/rust-lang-nursery/reference/tree/master/src
- From the work by Jason Orendorff. I think he said who wrote the grammar in order to understand RUST when writing the book Programming Rust. What a grammar and what a book!!
https://github.com/jorendorff/rust-grammar/blob/master/Rust.g4
- and from nikos older? rusty-pop
https://github.com/nikomatsakis/rustypop/blob/master/src/rusty.lalrpop
- but also LALRPOP
https://github.com/lalrpop/lalrpop
A. In most sources I struggled with character set descriptions and all the regeexpr. So I introduced, in my EBNF, 'Character Set Expressions' with set, union, difference and range. It will show up soon.
B. I think Mark Down ’marking’ is insufficient for a BNF. Perhaps generating plain HTML snippets from rules. They can be styled looking like program source. Color, font etc
C. EBNF decorated with types is a must. I’m not there yet
D. Actions in the rules like {} in yacc and => in LARPOP could perhaps be specified in the EBNF as a (parametric) reference to other source like pub Expr: Box
= { Expr ExprOp Factor @action(ActionName,Expr), Factor, }; So the @ or similar unused sign could be an element in the syntax. Clean up grammar from actions and relate them using ’@' instead. The intersection of EBNF and LALRPOP could be ...
E. Or perhaps the best. Have a tool exporting from LALRPOP as EBNF (with type) And the continue with generation railroad and markdown files from that source.
I’m really sorry for working mostly off road.
I will probably not update the above bnf2railroad but add a heavy refactored one as an other project
On 2018-04-17, at 12:35, Niko Matsakis notifications@github.com wrote:
@willy610
The status of my product is stable and no bigger issues more than proper UNICOE/UTF processing remains.
Sorry for disappearing! After writing those messages, I got totally overwhelmed (and this week i'm actually on vacation, so I'll probably be slow to reply again.) I'm a bit confused about which project you are referring to -- do you mean the railroad diagrams you referenced here? If so, that seems like a cool visualization technique, but presumably it requires an EBNF grammar to start? At the moment, that last part is what I am most interesting in obtaining; note though that if we had a working LALRPOP grammar, it "desugars down" to a plain CFG internally, so we ought to be able to use your tool to visualize it.
@ehuss
The people working on the reference have been making great progress, and the up-to-date grammar...appears to be getting closer to complete
That's great! Is it presently being tested? And, if so, how?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
Willy Svenningsson Johannas Vag 28 S-425 42 Hisings Karra Sweden +46 0768 22 20 26 willy@fager.st
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rust-lang/rust/issues/30942#issuecomment-382530349, or mute the thread https://github.com/notifications/unsubscribe-auth/AApc0klOHg4m2tWYN9CZhGDnfrqNIprlks5tp6vSgaJpZM4HF_MM .
That's great! Is it presently being tested? And, if so, how?
I don't know, I don't think there is anything formal set up. I think @brauliobz is doing most of the work, and he once mentioned that he was using Antlr4 to test.
@nikomatsakis I'm sorry for the long overdue response.
@harpocrates looking at your parser, I was curious, how have you tested it? I see a few tests in the repository, but I was wondering if you tested against e.g. the sources of the Rust compiler, or crates.io, or something like that.
I have a script that automatically scrapes large files from repos under the rust-lang
organization. I use rustc -Z ast-json -Z parse-only
as an oracle to tell me if my parser's output is correct. Any difference in AST outputs causes tests to fail. Last time I ran the script, it collected (and ran tests) on upwards of a million lines of Rust code.
The next priority would be a testing harness. It's rather late here so I'll have to write more later but I'd love to collaborate, particularly since my personal time is quite limited.
I should have more free time in the coming months and I'd be happy to help on whatever you guys need the most help on (porting over a concrete grammar, generally improving LALRPOP, etc.)
This is the tracking issue for rust-lang/rfcs#1331, which specifies a procedure for creating a canonical grammar apart from the compiler. This is a multi-phase process. I think the first step, honestly, is just to lay out a firm plan of how to proceed -- what kind of automatic testing to use and so forth. The RFC provides a general plan but it needs to be made more concrete. Once this issue has an owner, I (or they) can update this summary and try to keep it up-to-date.