softdevteam / grmtools

Rust grammar tool libraries and binaries
Other
507 stars 31 forks source link

Add operations to start condition processing. #341

Closed SMartinScottLogic closed 1 year ago

SMartinScottLogic commented 2 years ago

Pending: tests in lrlex/src/lib/lexer.rs

Question: If the input to fn lexer(...) leaves the lexer with a non-empty stack, should it be retained for (optional) resumption with a following call to the lexer?

ratmice commented 2 years ago

Interesting question, I know that if it can be retained that behavior being optional seems necessary. my nimbleparse_lsp project generally should assume files start in the Initial state, and calls lexer multiple times.

But for streaming lexers (which I don't believe exist), some retention between &str's would seem necessary but that would likely not be the lexer() function, but a function on lexer()'s return value, akin to the NewlineCache::feed() function.

One place I could imagine using this is in my crimson project, each file currently starts with an optional header/schema where type declarations go, which specifies the type of the top-level value. Something to retain the stack would allow me to specify a header/schema at the top of the first file then disallow it for subsequent files. To ensure things aren't sending redundant data or changing the schema between files.

So I guess my personally I think that if it isn't troublesome to implement, it seems like it could be an interesting feature.

ltratt commented 2 years ago

Question: If the input to fn lexer(...) leaves the lexer with a non-empty stack, should it be retained for (optional) resumption with a following call to the lexer?

The intention (though the API is definitely not complete for it!) would be that this is a "streaming lexer" (vs. the current NonStreamingLexers).

SMartinScottLogic commented 2 years ago

My understanding from the above, is that it would be worth extending the return value from lexer() to include details of the final state of the start condition stack. I like it, it will help with tests as well!

ratmice commented 2 years ago

Edit: I think I see now what I was missing in my comment below, and what you were getting at in your comment above. The tricky evaluation order at construction time has thrown my head for a loop.

So, I think through it maybe that lexer() might not be the place where this is needed, as that takes the .l file input (this is where I went awry!), rather than The input which is going to be performing start condition operations, perhaps I am misunderstanding something about your question though, but I don't see how the input to lexer().

Once I look at the LRNonStreamingLexer though and it's combination of new() and iter(), I don't see a good way to expose it without, turning that into an something like Iterator<Item=Result<Lexeme ...>, StartState)> , exposing the target state at every iteration.

Perhaps you have something in mind, but now that I think i'm looking in the right place I don't quite see any convenient way to just get just the final state, without doing something that I imagine might add overhead to each iteration. So, I'm no longer sure if this is something we can do right now, or just need to keep in mind when designing the "streaming lexer", like Laurie mentioned.

ltratt commented 2 years ago

I'm willing to be proven wrong on this, but I designed the NonStreaming interface to be as simple as possible, knowing that it would almost certainly be too simple for streaming. I don't know what the streaming interface should look like: it came under "think about it one day in the far future" :)

For example, if the current lexer can return "half way through", I think we would force the user to add an extra check "the lexer stack is empty" as a proxy for "was my file completely lexed"? That seems like something most people (I assume NonStreaming is the more common case!) would not expect to have to deal with themselves.

ratmice commented 2 years ago

I pushed a commit here, based on trying to get this working in comments example if it helps, but most of it is mentioned in my review. https://github.com/ratmice/grmtools/commit/67a57c6137a857ca13a3164590ad65d9c1c8df9d

ltratt commented 2 years ago

Hello @SMartinScottLogic, I wondered how this PR is getting on?

SMartinScottLogic commented 2 years ago

Sorry, I've been on family vacation - I'll aim to get the next commits in in the next couple of days

From: Laurence Tratt @.> Sent: 04 September 2022 10:42 To: softdevteam/grmtools @.> Cc: Simon Martin @.>; Mention @.> Subject: Re: [softdevteam/grmtools] Add operations to start condition processing. (PR #341)

Hello @SMartinScottLogichttps://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FSMartinScottLogic&data=05%7C01%7Csmartin%40scottlogic.com%7C61b39fd0b55d4d3688d808da8e59a75a%7C8979621591b24b6db90be8fe588f6336%7C0%7C0%7C637978812968196825%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=8Fg%2FwxV%2FAjD138PBjKQrDMg9J7wPJFZmN%2BYHzonSMxQ%3D&reserved=0, I wondered how this PR is getting on?

- Reply to this email directly, view it on GitHubhttps://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fsoftdevteam%2Fgrmtools%2Fpull%2F341%23issuecomment-1236299571&data=05%7C01%7Csmartin%40scottlogic.com%7C61b39fd0b55d4d3688d808da8e59a75a%7C8979621591b24b6db90be8fe588f6336%7C0%7C0%7C637978812968196825%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=y4e66og%2B8U1ntNkzX%2FLBP8Rq7ihix3ap5rYxB7Rr3vw%3D&reserved=0, or unsubscribehttps://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAESH56ECMIVEPNHA5U4ZV33V4RVEZANCNFSM56QFSLNQ&data=05%7C01%7Csmartin%40scottlogic.com%7C61b39fd0b55d4d3688d808da8e59a75a%7C8979621591b24b6db90be8fe588f6336%7C0%7C0%7C637978812968196825%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=unAfF4pHWz2D9GtU2OPvI0VQ11Gv9m1hatPMKpEegkU%3D&reserved=0. You are receiving this because you were mentioned.Message ID: @.**@.>>

ratmice commented 2 years ago

I guess, I have one minor question that isn't quite about this patch the example, lrpar/examples/start_states/comment.y Wondering if it is intended, or if it would be better to include epsilon in the rules, so that comments alone/lines with no text parse successfully, something like:

%start Expr
%%
Expr: Expr Text | ;
Text: 'TEXT';
ltratt commented 1 year ago

Hello @SMartinScottLogic, are you still chugging away on this one?

ltratt commented 1 year ago

@ratmice has very patiently and kindly explained what's going in this PR to the point that I've realised my comment in https://github.com/softdevteam/grmtools/pull/341#issuecomment-1214969503 was entirely bone-headed! This PR has nothing to do with streaming / non-streaming and I'm an idiot!

One thing that Matt and I have realised -- although we haven't thrashed out all the details -- relates to the comment in https://github.com/softdevteam/grmtools/pull/341#issuecomment-1214716803. The return type needs to have something like an Option<LexingState> attached to it. My working guess is that we might just be able to do something as simple as extending LexError:

pub struct LexError {
  span: Span,
  lexing_state: Option<LexingState>
}

Basically I think I'd like to get the functionality of this PR into grmtools before we make the next release if possible. I think there are people who will find this useful, and if (as is possible) it requires an API tweak or two, we can bite that bullet alongside the other API changes we have on the master branch.

ltratt commented 1 year ago

@SMartinScottLogic I'd love to see this PR get into grmtools before we do the next release (hopefully soon). Is that plausible, do you think?

SMartinScottLogic commented 1 year ago

Sorry for the delay on this one - I've had a number of distractions come up in my personal and professional life in the last few months. I'll hopefully be able to address the concerns on this PR in the next week or so.

From: Laurence Tratt @.> Sent: 02 November 2022 08:19 To: softdevteam/grmtools @.> Cc: Simon Martin @.>; Mention @.> Subject: Re: [softdevteam/grmtools] Add operations to start condition processing. (PR #341)

@SMartinScottLogichttps://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FSMartinScottLogic&data=05%7C01%7Csmartin%40scottlogic.com%7C18941c053fbf44bb6bb308dabcaaf713%7C8979621591b24b6db90be8fe588f6336%7C0%7C0%7C638029739757987760%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=z%2FEpxE0jG%2FmVU6DHcxr3RsaLoihl4pI3VIl%2FCfI51TY%3D&reserved=0 I'd love to see this PR get into grmtools before we do the next release (hopefully soon). Is that plausible, do you think?

- Reply to this email directly, view it on GitHubhttps://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fsoftdevteam%2Fgrmtools%2Fpull%2F341%23issuecomment-1299785602&data=05%7C01%7Csmartin%40scottlogic.com%7C18941c053fbf44bb6bb308dabcaaf713%7C8979621591b24b6db90be8fe588f6336%7C0%7C0%7C638029739757987760%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=v5KgShO0%2FY14LUDuTpwd%2B83XN0oAzEV%2FgkpZZraKtT4%3D&reserved=0, or unsubscribehttps://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAESH56GGAE35YNGLBKHVJWLWGIPZDANCNFSM56QFSLNQ&data=05%7C01%7Csmartin%40scottlogic.com%7C18941c053fbf44bb6bb308dabcaaf713%7C8979621591b24b6db90be8fe588f6336%7C0%7C0%7C638029739757987760%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Q7zl2PyYjXAdVkMikAgYvIy1wGdAzvOSXVsff%2FIWNEQ%3D&reserved=0. You are receiving this because you were mentioned.Message ID: @.**@.>>

ltratt commented 1 year ago

Excellent :)

SMartinScottLogic commented 1 year ago

I guess, I have one minor question that isn't quite about this patch the example, _lrpar/examples/startstates/comment.y Wondering if it is intended, or if it would be better to include epsilon in the rules, so that comments alone/lines with no text parse successfully, something like:

%start Expr
%%
Expr: Expr Text | ;
Text: 'TEXT';

Changed under 145f8d68eaf5aa0de3338ea56e692211f889f8cc

SMartinScottLogic commented 1 year ago

Could you clarify exactly what you'd expect to be in LexingState under https://github.com/softdevteam/grmtools/pull/341#issuecomment-1277830747 ? I'm having some difficulty tracking which suggestions are current and which have been superseded.

ratmice commented 1 year ago

@SMartinScottLogic I believe that adding the Option<LexingState>, is for the following cases, where it can be none

 // On receiving 'a' clear the implicit INITIAL state from the stack then push the B state, stack looks like [B].
a <B>'a' 
 // on receiving 'b' in the B state, Pop the B state, leaving an empty stack due to push/pop imbalance.
<B>b <-B>'b' 
  // on receiving 'a' when in the INITIAL state Pop the INITIAL state off stack leaving an empty stack.
<INITIAL>a <-INITIAL>'A' 

Initially these caused a panic, I had suggested that these could fall back to the initial state via unwrap_or(initial_state), But I believe the final suggestion was to make these return a LexError, with a lexing_state: None

Edit: Apologies for the lack of clarity on this point.

ltratt commented 1 year ago

I think @ratmice has summed up my thinking too. Apart from that I don't have any major comments.

SMartinScottLogic commented 1 year ago

I understand the failure cases in https://github.com/softdevteam/grmtools/pull/341#issuecomment-1304813657, what I'm unsure of is - in other cases where we produce a LexError (i.e. where lexing_state should be present) what type is within the Option? LexingState is not an existing type - shall I create a new type, and put current start state details in for the current PR?

ltratt commented 1 year ago

Good question, and I don't immediately have a good suggestion. @ratmice might?

ratmice commented 1 year ago

Aha, I was thinking just the id:usize, with the contents being state_stack.last() and the user would then have to call get_start_state_by_id, but now I see that it isn't pub (I'm running a bit behind on a holiday so I don't have a chance to see if there are any difficulties with making it pub though).

There are also some questions if we are making the id public if it should also have an idxnewtype associated with it I suppose? This at least seemed better to me than making the StartState clone, and putting that directly in the LexError?

ltratt commented 1 year ago

There are also some questions if we are making the id public if it should also have an idxnewtype associated with it I suppose?

What do we expect people to do with that state? Is a fully opaque state OK or do they need to peek at the contents to make use of it?

ratmice commented 1 year ago

There are also some questions if we are making the id public if it should also have an idxnewtype associated with it I suppose?

What do we expect people to do with that state? Is a fully opaque state OK or do they need to peek at the contents to make use of it?

I think it is just needed to get a struct StartState via get_start_state_by_id, and so fully opaque ID would seem fine, once they get the struct StartState, primarily the printing of the name, and name_span fields I believe just so you can discern which state the lexer was in when the error occurred.

The usize itself, being just an incrementing counter of states doesn't really provide much useful information itself really.

ltratt commented 1 year ago

Ah, I see, so I think we're saying:

  1. Ideally we want to return Option<StartState>.
  2. But that would require cloneing a StartState which commits us to an ugly API.

I think I would thus suggest:

Does that sound sensible?

ratmice commented 1 year ago

Does that sound sensible?

Yeah, sounds alright to me.

ltratt commented 1 year ago

Hello Simon, I'm keen to get a new release of grmtools out of the door soon, as we've piled up lots of useful changes since the last release 8 months ago (time flies!). Do you think we can get this PR done for that release? It would be great if we can! -Laurie

SMartinScottLogic commented 1 year ago

Sorry, I'm not understanding the 'opaque type' bit above - I haven't encountered this in rust - so I'm not certain of the desired code. I understand and agree with the remainder, and I've largely completed (using usize for StartStateId at the moment) - c.f. 224744d96caae7d33322b25ddd73666d196110ea

ltratt commented 1 year ago

Sorry! By "opaque type" we just mean struct S(SomeHiddenType) where users can't access SomeHiddenType directly but can still e.g. compare two Ss for equality, or hash Ss etc. In this case I think we'd just have struct StartStateId(usize) or similar. The advantage of an opaque type is that we can change the inner type later without imposing an API change on end users.

ltratt commented 1 year ago

@SMartinScottLogic Please squash (you'll need to rebase against master and resolve whatever the conflict(s) is/are).

SMartinScottLogic commented 1 year ago

Squashed, as requested

ltratt commented 1 year ago

bors r+

bors[bot] commented 1 year ago

Build failed:

ltratt commented 1 year ago

I think you'll need to add a new commit for cargo fmt.

ratmice commented 1 year ago

bors r+

ratmice commented 1 year ago

I thought I was assigned, but perhaps I'm misremembering. Anyway I don't think bors will listen to me because of that.

Edit: perhaps bors was just being slow!

bors[bot] commented 1 year ago

Build succeeded: