softdevteam / grmtools

Rust grammar tool libraries and binaries
Other
494 stars 32 forks source link

Error channels #431

Closed ratmice closed 6 months ago

ratmice commented 6 months ago

Here is my initial exploration, for replacing Vec<YaccGrammarError> with error channels in cfgrammar, as was kind of discussed in #428 I figured at least it was worth a shot, because it could be done matching the current ownership structure and giving a decent facsimile of the current API when owning both a send/receive channel.

A few things tripped me up, particularly parse_token and parse_name need to return YaccGrammarError because sometimes we replace (or drop) the error generated by these functions with a more specific one. particularly in the %expect-unused code path. I need to more carefully inspect the rest of the code in case there could be more cases such as these.

ratmice commented 6 months ago

So couple of things seem worth noting this morning,

  1. YaccGrammarConstructionFailure is kind of long, but seemed to convey the meaning?
  2. The channel only covers errors, we'd need either a wrapper for enum warnings + errors if we want to include warnings into the same channel, or a second channel for those (which seems somewhat unfortunate).
  3. There was some stuff i'd added to the ASTBuilder which probably needs to be removed as I couldn't figure out a way to utilize it there Removed in 9393e7f
  4. Hopefully the lifetime in ASTBuilder doesn't cause any problems anywhere, I don't think it will but since it has not being utilized anywhere yet, it is hard to tell.
ltratt commented 6 months ago

Very interesting! I think this PR has a slightly different aim to some of the stuff in #428, so let me see if I can wrap my pea brain around it!

It seems to me the main idea here is that the consumer of errors are "remote" (at least, potentially in another thread) so we use mpsc to ship existing error structs to those consumer(s)? One advantage of that approach over the "lots of functions" approach in https://github.com/softdevteam/grmtools/pull/428#issuecomment-1862375650 is that we get to make use of the existing enums, and that probably makes extending the API less cumbersome in the future. This feels like a better approach to me than "lots of functions". I also think I like the general ASTBuilder approach.

That said, reading the code made me realise that perhaps baking in mpsc stuff isn't quite what we want to do. Because mpsc::channel is slightly clumsy to use (in the sense of having errors etc.) that does muddle the code a bit. And while mpsc::channel probably works OK for many users (even those not using threads), it feels overkill for some users (e.g. nimbleparse), while for others they'll probably have to translate from channels to some other type (e.g. LSP libraries).

So what I'm now wondering is whether we can break this into perhaps 3 bits:

  1. ASTBuilder and friends feel like they're probably an API improvement full stop, irrespective of error reporting. We can probably get those in quite quickly on their own?
  2. Error reporting might better be a trait? I think this PR shows that it doesn't need to the "lots of functions" trait I originally envisaged, and perhaps it's not much more than (for lrpar):
    trait ParseErrorReporting{
    fn parse_error(YaccGrammarError) -> Result<(), Box<dyn Error>>;
    }
  3. We could then provide a channel implementation and/or "nice pretty printer" implementation of ParseErrorReporting as standard?

Please feel free to tell me that I have missed something important, as you've thought more deeply about this than I have!

ratmice commented 6 months ago

Yeah, this definitely has a different aim, as this would mostly be under the hood of something like #428 being the mechanism by which errors could be reported promptly before parsing has fully finished.

I tend to agree about mpsc::channel being clumsy, overkill and unfortunately producing an error on send. I could kind of live with that last issue due to the existence of the ConstructionFailure, and mpsc errors being able to be wrapped up in that same type, but it definitely isn't ideal.

My biggest gripe about the interface is something like the following,

let x = ASTBuilder::new().grammar_src("not valid .y source"); 
let y = x.error_receiver();
let ast_validity = x.build();
assert_eq!(ast_validity.errors(), vec![])

Because they call to error_receiver took the receiver that errors() is implemented in terms of.

What I would like to do is to try and make build() return something else, ASTValidity, which merely has bool, and an GrammarAST and no fn errors(), or fn warnings() and move all that to a trait that gets passed to the builder. I think we can still implement all the existing API on top of that (including ASTWithValidityInfo), while dropping the need for mpsc.

Anyhow, I would like to play with that approach before we try to get ASTBuilder in tree I think, if that sounds okay.

ratmice commented 6 months ago

mpsc::channel probably works OK for many users (even those not using threads), it feels overkill for some users (e.g. nimbleparse), while for others they'll probably have to translate from channels to some other type (e.g. LSP libraries).

One thing I liked about this approach is that things like nimbleparse can (and actually do currently on the branch) continue using the existing code-path where it returns a Vec<YaccGrammarError>, oblivious that there is an mpsc::receiver under the hood via via error_receiver.iter().collect(). In that sense, dealing with mpsc receiver is entirely optional for those what want promptness.

I still think we can get this same type of behavior with a trait instead of mpsc, by slightly different means though.

  1. We could then provide a channel implementation and/or "nice pretty printer" implementation of ParseErrorReporting as standard?

I currently have one "nice pretty printer", but it depends on an external library that is partially why I have hesitated to include it in any of these PR's so far (I really appriate how few dependencies grmtools has). I assume at they very least it should have a feature which could be at the very least disabled if not disabled by default, any preference here?

Please feel free to tell me that I have missed something important, as you've thought more deeply about this than I have!

I think the only thing that struck me was the above about nimbleparse not really needing to be affected by this kind of change, (or a similar one using a trait). Another bothersome thing about mpsc is to really utilize it from another thread it seems likely that other thread also wants the grammar source, which it wouldn't inherently have through the channel alone... I think it would be nice if we collect all the stuff needed to make nice errors together at the same level of abstraction. The ParseErrorReporting also needs a way to get it. Otherwise, spot on.

ltratt commented 6 months ago

What I would like to do is to try and make build() return something else, ASTValidity, which merely has bool, and an GrammarAST and no fn errors(), or fn warnings() and move all that to a trait that gets passed to the builder. ... I still think we can get this same type of behavior with a trait instead of mpsc, by slightly different means though.

What I'm now wondering is whether we make the trait a type parameter. So, for example, we might have something like:

trait ASTErrorReporter {
  fn error(...);
  fn warning(...);
}

struct ASTBuilder<R: ASTErrorReporter> {
  reporter: ASTErrorReporter,
  any_error_found: bool,
  ...
}

impl<R: ASTErrorReporter> for ASTBuilder<R> {
  fn blah(...) {
    if (...) { self.reporter.error(...); self.any_error_found = true; }
  }

  fn build(self) -> Result<(), ()> {
    if self.any_error_found { Err(()) } else { Ok(()) }
  }
}

and then we can provide, at least, one impl along the lines of:

struct NullASTErrorReporter;

impl ASTErrorReporter for NullASTErrorReport {
  fn error(...) { }
  fn warning(...) { }
}

as well as, perhaps, a simple text error reporter.

A neat thing about this general idea is that people can drop in whatever sort of error reporting they might want (mpsc channels, text, or nothing!) and they pay a performance price proportional to that. For example, if you use NullASTErrorReporter you can be fairly hopeful that the optimiser will (in release mode) remove the costs of error reporting entirely. But, even if you do that, build still returns Err, so we don't allow complete nonsense to appear to succeed.

[BTW: I don't know if we need to split apart error and warning. It's just an example!]

ratmice commented 6 months ago

I'll definitely give that a try, I had a lot of trouble trying to do so in CTBuilder, which is why I resorted to dyn Traits there, but I think that was mostly to do with all the parameters we need for conflicts causing bounds requirements on LRLexTypes::StorageT, which hopefully won't come into play working with just warnings and errors, since none of them involve that.

ltratt commented 6 months ago

I'd be fine with starting simple, and doing PRs for where it's easiest, and then trying the big nasty cases later.

ratmice commented 6 months ago

One wrinkle I hadn't considered yet, when switching the build method to return the new type is how to go from the new type to YaccGrammar. for ASTWithValidityInfo we curently have new_from_ast_with_validity_info Unfortunately, that function returns errors the new type won't have. So we can't just do the minor change of changing that ast_validation parameter to a trait.

I suppose my inclination would be to just leave that API unchanged. With the equivalent for the new type becoming part of the builder chain. Like we could have something like ast_builder.build_ast().build_grammar() where build_grammar is the replacement for new_from_ast_with_validity_info. Nothing particularly difficult, but still I'm curious if you have thoughts on how we want to organize this aspect.

ltratt commented 6 months ago

One wrinkle I hadn't considered yet, when switching the build method to return the new type is how to go from the new type to YaccGrammar

Oops, my example didn't do that bit correctly! So that should have been:

trait ASTErrorReporter {
  fn error(...);
  fn warning(...);
}

struct ASTBuilder<R: ASTErrorReporter> {
  reporter: ASTErrorReporter,
  any_error_found: bool,
  ...
}

impl<R: ASTErrorReporter> for ASTBuilder<R> {
  fn blah(...) {
    if (...) { self.reporter.error(...); self.any_error_found = true; }
  }

  fn build(self) -> Result<AST, ()> {
    if self.any_error_found { Err(()) } else { Ok(...) }
  }
}

On that basis, I'm not quite sure I see what problem there might be with the various build functions we have returning a specific type (AST/YaccGrammar/etc.) but maybe I'm missing out on something!

ratmice commented 6 months ago

On that basis, I'm not quite sure I see what problem there might be with the various build functions we have returning a specific type (AST/YaccGrammar/etc.) but maybe I'm missing out on something!

It really isn't much of a problem, I'm mostly just worring about the ergonomics, due to paradox of choice from an accumulating number of YaccGrammar::new_* functions.

It is unfortunately a little more complex than your example, currently we have something like:

// In AST
struct ASTWithValidityInfo {
   ast: AST, // May contain a partial invalid AST when errors is non-empty.
   errors: Vec<YaccGrammarError>,
   warnings: Vec<YaccGrammarWarning>,
}
// Maintains the invariant that it returns Err(errors) if errs.len() > 0
fn new_from_ast_with_validity_info(ast: &ASTWithValidityInfo) -> Result<YaccGrammar, YaccGrammarError>

When we sever ownership of errors/warnings of the AST I was imagining we'd have a different structure like

struct ASTValidity{
  ast: AST,  // May contain a partial invalid AST when errors are found.
  any_errors_found: bool,
}
// In YaccGrammar 
// Maintains the invariant that it returns Err(()) if any_errors_found == true
fn new_from_valid_ast(ast: &ASTValidity) -> Result<YaccGrammar, ()>

The main thing i'm worried about here is the ergonomics since we are accumulating quite a number of variations of YaccGrammar::new_* functions. I don't think anything currently uses all of the flexibility that this affords, e.g. nimbleparse_lsp uses the AST from this, for formatting it's conflict errors, (but we only run checks for conflicts on the grammar after a valid AST).

So the main thought was that if we had ASTBuilder::build_ast() -> ASTValidity, and ASTValidity::build_grammar() -> Result<YaccGrammar, ()>, it seemed like a way to avoid an additional pub YaccGrammar::new_* function, and it and allows having string -> AST -> Grammar all within one call chain.

Sorry if this was a bit long.

Edit: I wonder if perhaps renaming what i've been calling ASTValidity to something like GrammarBuilder, as in this case it would be kind of both an AST and a builder for grammars :shrug: so you could have something like ASTBuilder::new().grammar_src(...).grammar_builder().build() I'm not sure I like it as it loses the idea that it's just a wrapper around an AST, but it seemed worth considering

ltratt commented 6 months ago

So the main thought was that if we had ASTBuilder::build_ast() -> ASTValidity, and ASTValidity::buildgrammar() -> Result<YaccGrammar, ()>, it seemed like a way to avoid an additional pub YaccGrammar::new* function, and it and allows having string -> AST -> Grammar all within one call chain.

I agree that this general idea (and GrammarBuilder is probably a good name!) is the way forward.

ratmice commented 6 months ago

Alright, so i'm going to do one more of these big branches, just to make sure everything works end-to-end. Then i'll try and cherry-pick PRs from that to hopefully more easily reviewable bits. I probably will not make a PR for the complete branch, but will link to it from the individual PRs so I can rebase it on master as things land. That will give me some confidence we aren't running into any unforeseen issues.

Anyhow it seems we can probably close this one for now, and I'll try and get something to look at in the next couple of days.