skvadrik / re2c

Lexer generator for C, C++, Go and Rust.
https://re2c.org
Other
1.07k stars 169 forks source link

Default implemetations for YYFILL and lexer context #390

Closed krishna116 closed 1 year ago

krishna116 commented 2 years ago

Currently user need to provide: re2c:define:YYFILL = "return LexerNeedMoreInput;"; to process buffer fill event(interrupt).

But if provide a rule:

YYFILL {
     // process fill event here.
     // static char* curser= ...; static char* maker = ...;    static char* limit = ...;  static int state = ...;
     // continue or return to outside; 
 }

It has some advantages: 1, all events have a uniform process logic. 2, User data-structure(out side of lexer function) do not need provide these lexer-data-structure, because they can be processed in the YYFILL-rule.

bufInfo->maker;
bufInfo->limit;
bufInfo->state;
//...

3, so user data-structure can split with lexer-data-structure. it may be more clear code for user. 4, it give one-shot-buffer and multi-shot-buffers a more uniform process logic. 5, it is possible to process other interrupt(not only YYFILL interrupt) like this, it is extendable and uniform.

Thank you.

krishna116 commented 2 years ago

By the way, it seems YYFILL-rule and EOF-rule have same effect, the difference is YYFILL-rule need more input when meet EOF and the EOF-rule is just a EOF; currently implementation in re2c, even re2c:eof used, it still go to YYFILL-rule, it is confusing. may be there is only one EOF-rule, and just using a variable to denote if it need more input or not which is more clear for user.

so it also can be more clear and uniform for EOF logic:

$ { 
       //1, {refill and continue; or return to out side;}  //no matter user care or not care re2c:fill, user always can do this;
       //2, if(re2c:fill) {refill and continue; or return to out side;} // a better method for user consider.
}

in this case, in order to keep compatible, let user write either EOF-rule or YYFILL-rule(just a alias to EOF-rule) is ok, the user won't confusing too, because the process logic either //1 or //2 in the code, it's always free to choose.

Thank you.

krishna116 commented 2 years ago

By the way, I sketched a design pattern, if YYFILL-rule or only-one-EOF-rule provided, it will be more useful. re2ctx.zip

krishna116 commented 2 years ago

sorry, it may be not a good question, so it closed.

thank you.

skvadrik commented 2 years ago

Sorry I didn't respond earlier. I try to resolve bugs and issues when people are blocked quickly, but when there is a suggestion or a feature request, I sometimes don't have time to respond within a few days. No need to close the bug.

Your suggestion to provide a default implementation for YYFILL has a problem: YYFILL may be used in different contexts that require different implementations, and it would be impossible to provide one that suits everyone. It may be possible to provide an optional default definition that covers some popular cases, but it requires a lot of thought.

There was such effort in the past (in the form of a library), but it wasn't very popular as most of the re2c users still have a slightly different setting and need to implement their own thing. Your re2ctx looks a bit like that. Maybe we should revive that effort --- let's keep the bug open as a reminder.

it seems YYFILL-rule and EOF-rule have same effect

This is not true: it is possible to use EOF rule $ without YYFILL. This example shows it (you can see re2c:yyfill:enable = 0; configuration.

krishna116 commented 2 years ago

I have tried more tests to understand how it works, and many idea appeared and disappeared or re-throght-again... I'm now thinking that YYFILL now matter how it implemented, it always has its advantage and drawbacks and it is complicate to adjust,

It may be possible to provide an optional default definition that covers some popular cases, but it requires a lot of thought.

Now I think so too, and I had tried to do such a thing...yet it is not ready.

This is not true: it is possible to use EOF rule $ without YYFILL. This example shows it (you can see re2c:yyfill:enable = 0; configuration.

I got it yestoday, because YYFILL not return to outside, I got [\x00] rule; if YYFILL return to outside, finally it run to $ rule.

thank you.