softdevteam / grmtools

Rust grammar tool libraries and binaries
Other
507 stars 31 forks source link

Analysis builder #353

Closed ratmice closed 2 years ago

ratmice commented 2 years ago

Headed out the door for a few days, and haven't gone over this rebasing of my branch with a fine tooth comb, as such I'm marking it as a draft. But I wanted to try and get this series in some shape which could at least be be looked at.

I've tried to split up the changes of migrating code from build() into the various CTAnalysisBuilder structs, over a series of patches.

There is one place that deserves particular scrutiny I've added a note to an existing comment before the call to std::fs::remove_file. Which details why I believe the changing of timing between returning ConflictsError now being before the call to remove_file call should be ok since the bug requiring it to be returned after should be fixed, there is a link to the bug.

I should probably say something about the Analysis trait, I had tried having something with a closure that returns things. but usage in that manner broke the builder so you couldn't just chain calls, so I ended up with how it is.

ltratt commented 2 years ago

I'm not quite sure what the purpose of https://github.com/softdevteam/grmtools/pull/353/commits/1af9ba262e001a6183f098102437aa6a56ef256a -- it doesn't really seem related to this PR?

ratmice commented 2 years ago

This comment was posted earlier, but somehow in response to the wrong thing, reposting here:

I'm not quite sure what the purpose of 1af9ba2 -- it doesn't really seem related to this PR?

The reason it is here, is so that build_grammar can return a YaccGrammarError or a Spanned error. We have Non-Spanned errors before that, (reading yacc files to build the AST), and non-spanned errors again after build_grammar (write_parser here).

Edit: The one other thing to say about 1af9ba2 is without it, build_for_analysis can't write a grammar. so it makes that accessible to both build() and build_for_analysis()

ltratt commented 2 years ago

I've had a couple of goes at reviewing this PR, and I must admit that I've been defeated both times, so it's probably time to admit that out loud! I think I have two problems:

  1. I'm lacking motivation/context for why we need the Analysis trait. [The commit messages tell me what the commits do, but not why, so I end up guessing... except I mostly confuse myself :)]
  2. I'm not really sure what the CTBuilder changes are doing. Are they important or ... ?

Pointers/suggestions are most welcome!

ratmice commented 2 years ago

Well I totally understand your position, I've had some difficulty trying to split this patch apart into coherent stages. I'll try and give a general overview of the motivation for the various things:

ASTValidation

This is probably the most important part of the patch and the easiest to split out from the rest, selectively giving access to a potentially invalid `GrammarAST` and `Vec`s produced, instead of the current `GrammarAST` or Vec`, while not loosing track of the fact that the specific `GrammarAST` may or may not actually be valid. We could perhaps stop here and make `unused_symbol_warnings` public, and forgo adding the `Analysis` trait, but this doesn't make warnings accessible from the Builder API.

CTBuilder

Trying to make it accessible from the builder API we run into quite a few distinct issues, but they are all somewhat tangled together. The common underlying factor is that CTBuilder::build() hides everything:

  1. It hides types: for instance it produces Box<dyn Error> instead of Spanned error types.
  2. It keeps the source String, and the GrammarAST as local values which never escape build(), on success it writes out the parser sources never revealing the GrammarAST, on failure it produces Box<dyn Error> also never revealing the GrammarAst or Spanned errors.

The builder changes, make these local values like GrammarAST, and Spanned errors available from a parallel builder API using a funny closure which is the Analysis trait. It also takes buffers things like the source text to write into so the caller can access the sources to associate with Spanned errors that are returned.

build() then becomes equivalent to

build_for_analysis()
   .build_ast()?
   .build_grammar()?
   .build_tables()?
   .source_generator()
   .write_parser()?

where Spanned Errors have been converted into Box<dyn Error>, and between each build_ there is an optional analysis that can be run, which reveals to the analysis the local GrammarAST or YaccGrammar, etc value which was previously entirely local to build.

In between each of the build_{ast, grammar, tables} the trait just acts like a callback essentially takes the form of:

build_ast().analyze_ast(&mut wa: WarningAnalysis, |warning_analysis, &grm: GrammarAST| {
   /// Pull some values out of grm, and mutate them into self which the caller owns.
}).analyze_ast(&mut rda: RailroadDiagramAnalysis, |diagram_analysis, &grm:GrammarAST| {
   /// Generate an image of the AST in the form of a railroad diagram.
}).analyze_ast(&mut cra: RuleNamesShouldBeCapitalizedAnalysis, |capitalization_analysis, &grm: GrammarAST| {
   /// Check that each rule starts with a capital letter.
}).build_grammar()

Except traits with their self types allow you to collapse the mutable parameter and closure down into one self parameter.

So one of the key things here also is the ability to run multiple distinct analysis, while only calling the builder one time. While leaving the old simple build() intact merely as the composition of this new verbose one which hides less information.

So that is kind of where this is coming from, wanting to do things which if we did them in grmtools proper would pull in dependencies like image or svg crates, using existing nice error formatting crates, so that we don't have to add these types of things as a dependency/feature of grmtools itself. I.e. instead you could just bring in an analysis crate which is all set up to do that and add it as part of your build.rs script.

I should also stress, that I'm not enamored with how this patch is, but it has managed to fit those criteria while being flexible for future expansion. I hope this helps unravel some of the motivation for this.

ratmice commented 2 years ago

Well, I went ahead and added the railroad diagram example, this is mostly here to provide motivation, because the example was originally intended to show errorrs, the yacc file errors out before actually generating the diagram. But if you swap in a working one, it should generate the diagram on stdout.

The thing to note is that this is pulled out of a project which is using the RTBuilder, but there is currently no way to do this kind of thing when using CTBuilder.

The above is not exactly true, you could read and build the grammar multiple times, once with CTBuilder, and once in the RTBuilder way usingYaccGrammar::new_with_storaget, but that is what this patch is trying to avoid.

ratmice commented 2 years ago

I went ahead and created a simpler PR with dustbins the Analysis and extra builder stuff, we could close this one if you'd rather start from that?

ltratt commented 2 years ago

It does feel like closing this one now makes sense.

ratmice commented 2 years ago

Indeed