softdevteam / grmtools

Rust grammar tool libraries and binaries
Other
518 stars 31 forks source link

Relax `%parse-param` bounds from `Copy` to `Clone` #471

Closed ratmice closed 1 month ago

ratmice commented 1 month ago

In the recent discussion in #404 the @ltratt brought up the idea of relaxing the Copy bounds to Clone. Here is a quick attempt at an implementation of that, it seems a lot simpler than I expected.

Marking as draft for the following reasons:

ltratt commented 1 month ago

Huh, that is easier than I expected! That now makes me wonder if we can/should pass ParamT (without any bound) and then pass it through as & or &mut. Any thoughts?

ratmice commented 1 month ago

Passing it in as &mut is possible, but it requires us to change the ActionFn signature so that ParamT cannot be captured by actions, this is the current reason that clone/copy is required, that is each action gets a different cloned/copied instance of the param because each action can capture it.

This requires a couple of tricks higher ranked trait bounds (HRTB's) for <'a> Fn(X) -> Y and hrtb-implicit-bounds.

The major concept lifetime-wise is that the lifetime of the parameter is less than the lifetime of the function the parameter gets passed to, and thus the functions closure cannot capture it, once the compiler knows that it is implied that once the function returns the reference that was passed is no longer referenced, and thus unique ownership is retained.

On top of this, there is a lot of "pattern matching/codegen stuff" which is made hard by any needs for explicit lifetimes, and separation of the types from the references, the syntactical side here can actually be a bit of a pain too because of the differences between pattern matching in rust expressions and rust parameter lists.

It is definitely possible, I've gotten it working, but IIRC it was fairly syntactically wonky/mediocre

ltratt commented 1 month ago

Ah, so what you're saying is that &mut works, but is so messy, it's probably better to just do Clone?

ratmice commented 1 month ago

I think i'm saying slightly less than that, it is tricky, but I don't think it is that messy, but it is a break in compatibility maybe & and &mut will both work IIRC.

But it is possible that syntactically the arguments to %parse-param end up kind of at the mercy of what the rust compiler accepts without very quickly needing to integrate a rust parser into grmtools (which I don't think we want to do). It is the syntactic issues here which has caused the biggest difficulty really when I've tried this in the past. I think the syntax of %parse-param has changed though since the last time I tried it. That may well have been because we were still trying to make it work with tuples and the like though.

ltratt commented 1 month ago

Right. I don't remember what we changed here, either. Clone feels more flexible than Copy, though, so I think we should just do it.

ratmice commented 1 month ago

Sounds good to me, I'll try and work on examples/docs/tests soon.

ratmice commented 1 month ago

So, curiously I haven't been able to make a backwards incompatible test like I thought, code such as the following modification of lrpar/cttests/parseparam.test appears to compile fine.

The theory that I am working under is that in the generated code the Clone bounds are not present (user actions are compiled using the type given in the %parse-param directive without any weakening or generic types and bounds),

So within the generated code we know that u64 is Copy and it is only within library code that we are dealing with generic types with clone bounds ParamT: Clone.

So unless I am mistaken and there is something weirder going on, it seems like you can actually rely on types being copy within the actions even though the library/parse algorithm can only rely on the weaker Clone bounds with explicit cloning. This is absolutely not what I was expecting would happen though.

name: Test %parse-param clone
yacckind: Grmtools
grammar: |
    %start S
    %parse-param p: u64
    %%
    S -> u64:
        'INT' { (move |_| {})(p); check_copy(p); p + $lexer.span_str($1.unwrap().span()).parse::<u64>().unwrap() }
    ;
    %%
    fn check_copy<T: Copy>(_: T){}
lexer: |
    %%
    [0-9]+ 'INT'
ltratt commented 1 month ago

The theory that I am working under is that in the generated code the Clone bounds are not present

That is a good point -- I hadn't even thought about that!

So, dumb question: does that mean we need any bounds at all? Can we just pass through an arbitrary type T? I feel like this surely can't work when you have a rule that references multiple rules, but... who knows!

ratmice commented 1 month ago

The theory that I am working under is that in the generated code the Clone bounds are not present

That is a good point -- I hadn't even thought about that!

So, dumb question: does that mean we need any bounds at all? Can we just pass through an arbitrary type T? I feel like this surely can't work when you have a rule that references multiple rules, but... who knows!

I'm certain that this is not the case (exactly for the references multiple rules reason given). This is a particularly interesting case, in the generated code we call parse_actions which simplifying its signature down to the relevant bits looks like parse_actions<ParamT: Clone>(actions: Vec<&dyn Fn(..., ParamT)>, param: ParamT), The generated implementation of each action though has the concrete type (in this case u64), note that &dyn Fn(...) being a trait object as well as the ParamT: Clone bounds.

So only through that mechanism can we call each individual action from parse_actions only knowing the generic bounds of Clone, but from a function called within it know the concrete type u64. So it is actually quite a few layers of trait inheritance that appears to allow this behavior.

ratmice commented 1 month ago

FWIW, here is kind of minimization of what is going on here, in the following g represents e.g. parse_actions, f represents a user action. The following code compiles as long as Foo derives Copy, but fails when you remove the Copy derive, despite the fact that g which actually calls f which relies upon the Copy trait, treats ParamT generically only having Clone bounds. All that is to say, I do think that relaxing from Copy bounds Clone is even backwards compatible when it comes to user actions which rely on the value implementing the Copy trait.

#[derive(Copy, Clone)]
struct Foo;

fn f(x:Foo) {
    let bar = *&x;
    (move |_|{})(bar);
}

fn g<ParamT:Clone>(fns: Vec<&dyn Fn(ParamT)>, param: ParamT){
  for f in fns {
    f(param.clone())
  }
}

fn main() {
   let x = vec![&f as &dyn Fn(Foo), &f as &dyn Fn(Foo)];
   g(x, Foo)
}
ltratt commented 1 month ago

So with the Clone bounds, when we pass the object to generated code, do we end up with a problem then if there are multiple references to other rules? I'm wondering if %parse-param Rc<RefCell<...>> works, as an example of something that must be cloned and can't be moved multiple times.

ratmice commented 1 month ago

(With the caveat that I haven't fully tested it), the answer should definitely be no we shouldn't have any problem. The problem with multiple rules, rears its head when we try to remove both the Copy and the Clone bounds. Such as the following modification of the minimized example:

#[derive(Copy, Clone)]
struct Foo;

fn f(x:Foo) {
    let bar = *&x;
    (move |_|{})(bar);
}

// We removed the ParamT: Clone bounds here.
fn g<ParamT>(fns: Vec<&dyn Fn(ParamT)>, param: ParamT){
  for f in fns {
   // Subsequently get an error here.
    f(param)
  }
}

fn main() {
   let x = vec![&f as &dyn Fn(Foo), &f as &dyn Fn(Foo)];
   g(x, Foo)
}
11 f(param) ^^^^^ value moved here, in previous iteration of loop

help: if ParamT implemented Clone, you could clone the value

To actually get rid of the Clone orCopybounds, we would need to modify the signature of the action functions to remove their ability to capture or move theParamT` value.

struct Foo;

fn f(x:&mut Foo) {
    // Action can no longer move/capture x in the action.
}

fn g<ParamT>(fns: Vec<&dyn for <'a> Fn(&mut ParamT)>, mut param: ParamT){
  for f in fns {
    // Because of the funky for<'a> signature the param could not have been moved in a previous iteration of the loop
    // so now this is ok.
    f(&mut param)
  }
}

fn main() {
   let x = vec![&f as &dyn Fn(&mut Foo), &f as &dyn Fn(&mut Foo)];
   g(x, Foo)
}

Does that help answer your questions? Anyhow I'll work on an example using Rc<RefCell<...>> now that I have a better idea of what the backwards compatibility scenario for changing the bounds to Clone looks like.

ltratt commented 1 month ago

Thanks for the pointers!

Anyhow I'll work on an example using Rc<RefCell<...>> now that I have a better idea of what the backwards compatibility scenario for changing the bounds to Clone looks like.

Sounds good to me :)

ratmice commented 1 month ago

Sorry it took me a while to get the rest done, Some notes:

ltratt commented 1 month ago

I think this is ready for squashing?

ratmice commented 1 month ago

Squashed,

I had considered keeping separate commits for the example, but it all seemed short enough that just doing one commit didn't seem unjustifiable.

ratmice commented 1 month ago

Apologies, didn't realize it was still marked as draft.

ltratt commented 1 month ago

Please squash.

ratmice commented 1 month ago

Squashed.