Closed krishna116 closed 2 years ago
You should use @@
instead of 0 in YYSETSTATE
:
re2c:define:YYSETSTATE = "bufInfo->state = @@;";
The point here is that only re2c knows what the correct state is, but not the user. Each time re2c generates YYSETSTATE
, it substitutes @@
with the correct state. The actual state is different for different YYSETSTATE
invocations in the lexer.
As for -1
, it is always used as the default state in re2c. Maybe we should say this more explicitly in the docs.
See this example if you haven't already and the description of re2c:api:sigil
condiguration.
but if I using this:
re2c:define:YYSETSTATE = "bufInfo->state = @@;";
the output is a dead loop.
Can you elaborate on where you have an infinite loop? Ideally provide an example that shows a hanging program.
I did test your example with @@
and it finished normally. In fact, there was no difference in the output because in your original example the whole input fits into the buffer, so there is no need for refilling it (so YYSETSTATE
didn't matter).
I had add a break in the while loop, this is the output:
I original code is indeed using:
re2c:define:YYSETSTATE = "bufInfo->state = @@;";
so I'm confused as you say there is no problem.
I have attached the re2c generated code here, may be you can diff the difference. src.zip
Oh, I know what the problem is: I was testing with the most recent re2c from git master
branch, which has this commit: https://github.com/skvadrik/re2c/commit/2c0dd72332c2d23270179d8c75a7ce7f5ae02240. If you read the commit description, it explains why it is necessary to generate YYSETSTATE(-1)
in final state. Previous re2c version relied on the user to do this.
If you cannot update re2c, then you can manually add bufInfo->state = -1;
in final states before return.
Note that the way you organize the lexer loop is a bit unusual: you return from the lex
function to main
from every final state, only to reiterate and call the lex
function again. It is more convenient to put the lexer loop in the lex
function (make it bypass the getstate:re2c
block as shown in the example), and let the outer loop in main handle the exceptional situations when the lexer needs more input, or when it encounters an error, or when it terminates successfully.
If you reorganize the lexer loop in lex
to bypass getstate:re2c
, you won't need YYSETSTATE(-1)
in final states and the lexer will be faster, as it will bypass the initial state switch in the main loop. This is precisely the reason to have a separate getstate:re2c
block.
I attach your example reworked as I suggested: test4.lex.txt. The changes are:
getstate:re2c
in the lexer loop in lex
lex
in "normal" final states: return only in exceptional situations (need more input, error, terminate).yytext
--- should be adjusted in YYFILL as well as other pointers).main
.This example works with older re2c versions as well.
I have read the commit: 2c0dd72. I feel the same as it is except that -1 is used as not only initial state but also begining state. so the "bufInfo->state = @@;"; is either last interrupt state or initial state? may be let use know what exactly they want is better and this"@@" symbol/place-holder has no semantic means. so it seems introduce such as: @{init/begin}, @{interrupt} standard internal state for using is better.
thank you for the advice . If I need parsing any data package your provided code is good. If I just need splited-tokens send to grammar-parser, the code is becoming more complicate than flex, I alway think what is better design or understandable-concise-way to do it.
I have download and compile latest re2c source code, and using latest re2c.exe is ok, no dead loop happened for my code, in fact I'm learning re2c and writing notes/code/tutoirals for other peole/beginners.
thank you.
After some test, I find that option "--storable-state " and this interrupt-state "@@" and YYFILL should be used with all these stuff:
re2c:define:YYMARKER = "bufInfo->maker";
re2c:define:YYLIMIT = "bufInfo->limit";
re2c:define:YYGETSTATE = "bufInfo->state";
re2c:define:YYSETSTATE = "bufInfo->state = @@;";
re2c:define:YYFILL = "return LexerNeedMoreInput;";
now I understand: it means it is used to parsing inconsecutive stream buffer-blocks, the lexer will re-entry and shot buffer-block many times. if the buffer is just one-consecutive-one-shot-block, I don't need all those stuff. I just need take care YYCURSER, that's all. so I attched a new modified example, it is more concise. test4-1.zip .
Right, you don't need YYFILL
unless your input is too large and you need buffering. You can read more about how YYFILL
works and when you need it in the manual: https://re2c.org/manual/manual_c.html#buffer-refilling. I also recommend reading the section about end-of-input handling: https://re2c.org/manual/manual_c.html#handling-the-end-of-input.
The --storable-state
option (described here: https://re2c.org/manual/manual_c.html#storable-state) is more advanced than YYFILL
. It is only needed in cases when the lexer may be interrupted in the middle and resumed later (for example because it has to wait on a socket for more input to appear). The -1
state means that the lexer was not interrupted. That's why state -1
is used both as initial and in the final states, when the lexer has successfully processed a full lexeme and is ready to start processing the next one from the initial state. Other possible values of @@
are various interrupt states.
The simplified lexer test4-1.zip looks good.
in fact I'm learning re2c and writing notes/code/tutoirals for other peole/beginners.
That's awesome, thank you for the effort!
ok, I will read the other pasts of the manual in some time, may be I could ask questions in the irc channel is better. thank you.
may be I could ask questions in the irc channel is better
You are welcome on IRC (it seems that there is a significant timezone difference, I am in BST timezone).
This is the code I sketched to get a list of tokens from a string using re-entry mode. I using this code for example.
the above code's output is:
So the advice is:
1, in the main function the initial bufInfo.state = -1, it could be:
#define RE2C_STATE_INIT -1
2, in the lex function, I set "bufInfo->state = 0" to restore, because I guess it is always the begining/start/nothing-matched state, if so it could be:
#define RE2C_STATE_BEGIN 0
3, the code may be not good, if so I always like to listen the advice.
thank you.