osa1 / lexgen

A fully-featured lexer generator, implemented as a proc macro
MIT License
63 stars 7 forks source link

The state type is required to implement Default even if client code never calls `new` or `new_from_iter` #54

Closed alan-j-hu closed 2 years ago

alan-j-hu commented 2 years ago

The README claims the following: https://github.com/osa1/lexgen/blob/6be10c9ab5c56a5421a72891ba1069acec8e9b15/README.md?plain=1#L302-L308

However, as the library currently stands, the state type is required to implement Default even if the new and new_from_iter functions are never used. The reason is that the new and new_from_iter do not list $StateType: Default as a constraint, yet assume an implementation of Default in their bodies, and therefore if $StateType does not implement Default, the compilation of the entire program fails, even if the client never uses those functions.

A partial solution, which I implemented in the process of making https://github.com/osa1/lexgen/pull/53, is to add $StateType: Default as an explicit constraint. However, this only bypasses the problem if$StateType contains generic parameters. So, if $StateType does not implement Default, $StateType = Foo<'input> and $StateType = Foo<'a> are fine, but $StateType = Foo<'static> and $StateType = Foo are not. The fact that Rust rejects Foo<'static>: Default means that I had to do 'input: static and Foo<'input>, which solves the issue but creates warnings that the generic parameter 'input should be replaced with 'static. This is also only a partial solution because it does not allow a state type that does not implement Default if the state has no generic parameters.

(In my opinion, the rejection of a function with constraint Foo: Default, where Foo does not implement Default, is an illogical design choice on the part of Rust. If the constraint Foo: Trait is not satisfied, that is no different in principle than if Foo<T>: Trait is not satisfied for some choice of T. If Foo<T>: Trait is not satisfied for some choice of T, that simply means that the function is not usable for such T. If Foo: Trait is not satisfied where Foo has no generic parameters, this should simply mean that the function is not usable under any circumstances, not that the compiler should reject the function definition. This illogical design choice unfortunately makes my PR messy.)

alan-j-hu commented 2 years ago

To be honest, I think the simplest solution would be to make a breaking change and remove the new and new_from_iter functions. It would be slightly inconvenient for people whose types implement Default, who would have to explicitly pass default() to the *_with_state functions. However, the current situation is much more inconvenient for the people whose types do not implement Default (e.g. because all their constructors take parameters). If the state type is required to implement Default, that restricts what programs may be written. On the other hand, removing the two convenience functions would mean that users whose types do implement Default would have to type slightly more than they otherwise would, but would not fundamentally change what programs are possible.

osa1 commented 2 years ago

Thanks for reporting this.

As can be seen from my wording in the README ("Used when the lexer has user state that does not implement Default"), my intention was to support user states that do not implement Default, but apparently I didn't do it right.

IIRC, originally we only supported state that implements Default. The constructors ..._with_state were added later.

Just to be concrete, here's the relevant parts of the generated code for a lexer Lexer(LexerState) -> ():

struct Lexer<'input, I: Iterator<Item = char> + Clone>(
    ::lexgen_util::Lexer<'input, I, (), LexerState, ::std::convert::Infallible, Lexer<'input, I>>,
);

impl<'input> Lexer<'input, ::std::str::Chars<'input>> {
    fn new(input: &'input str) -> Self {
        Lexer(::lexgen_util::Lexer::new(input)) // error: `LexerState: Default` is not satisfied
    }

    fn new_with_state(input: &'input str, user_state: LexerState) -> Self {
        Lexer(::lexgen_util::Lexer::new_with_state(input, user_state))
    }
}

impl<I: Iterator<Item = char> + Clone> Lexer<'static, I> {
    fn new_from_iter(iter: I) -> Self {
        Lexer(::lexgen_util::Lexer::new_from_iter(iter)) // error: `LexerState: Default` is not satisfied
    }

    fn new_from_iter_with_state(iter: I, user_state: LexerState) -> Self {
        Lexer(::lexgen_util::Lexer::new_from_iter_with_state(
            iter, user_state,
        ))
    }
}

lexgen_util::Lexer::new_from_iter and lexgen_util::Lexer::new require the state to implement Default.

I'm not sure if we can fix this without making the generated lexer structs (struct Lexer above) parametric on user state type, which would allow us to do something like:

struct Lexer<'input, I: Iterator<Item = char> + Clone, S>(
    ::lexgen_util::Lexer<'input, I, (), S, ::std::convert::Infallible, Lexer<'input, I>>,
);

impl<'input, S: Default> Lexer<'input, ::std::str::Chars<'input>, S> {
    fn new(input: &'input str) -> Self {
        Lexer(::lexgen_util::Lexer::new(input))
    }
}

Now Lexer::new would only type check if S: Default is satisfied.

To avoid typing the more complicated type MyLexer<MyLexerState> instead of MyLexer, we could generate a new type and a type synonym:

struct Lexer_<'input, I: Iterator<Item = char> + Clone, S>(
    ::lexgen_util::Lexer<'input, I, (), S, ::std::convert::Infallible, Lexer_<'input, I, S>>,
);

type Lexer<'input, I> = Lexer_<'input, I, LexerState>;

impl<'input, S: Default> Lexer_<'input, ::std::str::Chars<'input>, S> {
    fn new(input: &'input str) -> Self {
        Lexer(::lexgen_util::Lexer::new(input))
    }
}

impl<'input, S> Lexer_<'input, ::std::str::Chars<'input>, S> {
    fn new_with_state(input: &'input str, user_state: S) -> Self {
        Lexer(::lexgen_util::Lexer::new_with_state(input, user_state))
    }
}

Now the code should be backwards compatible and it should no longer require UserState: Default unless new or new_with_input are used.

I wanted to demonstrate the problem in detail but on the way came up with a solution..

WDYT about this solution @alan-j-hu? Would using a type synonym (instead of a struct) for the generated lexer type be a problem?

That said, I think it should be OK to just make new and new_from_iter take a state argument, just to keep the implementation simpler. I'm not sure if it would worth generating an extra type and making the code generator more complicated just so that some users won't have to pass Default::default() to lexer constructor in some cases. Passing Default::default() to constructors doesn't sound too bad.

alan-j-hu commented 2 years ago

I don't see any issues with creating a type alias. I'm fine with either the type alias or removing the new and new_from_iter functions, whichever you think is better design for your library.

osa1 commented 2 years ago

I wasn't sure how much complexity the solution described above would add to the code generator so wanted to give it a try. It turns out it's not too bad, so I just implemented it.