Closed StoneyJackson closed 11 months ago
This is closely related to my student’s project a couple of years ago. Rather than arbitrary include directives, we saw grammar files as a whole as incremental. For example, V1 would include (depend on) V0, V2would include V1, etc.It is an idea worth pursuing, but it has drawbacks in terms of readability. I’d like to review our findings from that project. I do remember one thing: be careful about token definitions, because order matters.Jim
Bowing to unrelenting pressure, I have succeeded in implementing an 'include' feature for input files to plcc.py. First, so as not to break any existing code, the use of 'include ...' at the end (normally) of the semantics section stays exactly the same: file names are simply added to the argv array and processed as if they were parameters given on the command line.
My proposed 'include' feature allows for lines of the form
#include filename
just like C/C++. When such a line appears anywhere in the input file (after any command-line switches), input lines switch to the file with the given filename, and returns to the previous file once the new file contents have been read. These #include directives can be nested -- that is, an #include file can itself have an #include part, and everything gets stacked up.
But BEWARE: if a file has an include like this:
#include fff
and if the file 'fff' itself has the same include line
#include fff
the include mechanism could blow up with a stack overflow. I have made it so that you can't have nested includes more than 4 levels deep, which avoids this problem. I can't imagine nesting even this much, but I'm open to suggestions.
The tricky part about this is that there might possibly be a situation where code in the semantics section between the %%% ... %%% markers has
were C/C++ -- an unlikely situation, but oddly possible given the insatiable desire of both of you to target any implementation language that is Turing complete. In order to side-step this possibility, I have TURNED OFF the processing of #include directives for code between %%% ... %%% markers. I think this makes sense, and basically treats this code as being entirely language independent (except for lines themselves starting with %%% -- ouch!).
So whaddya think? I haven't propagated this change to the pithon.net repository. I could share just the code for plcc.include.py for you to test. Let me know...
Regards, Tim
On a related issue, do you want me to implement the "lineMode" feature as I proposed earlier? That is, if a token definition looks like this:
token PCT3$$ '^%%%'
then the scanner will enter line mode whenever it sees the PCT3$$ token on input, and will exit line mode when it sees another matching PCT3$$ token. The "$$" at the end of the token name is what I used to toggle line mode. There are other ways we could use token names to trigger line mode -- my choice of "$$" is only a suggestion.
In line mode, the appearance of <> on the RHS matches an entire LINE of input, and it returns a token with the special token name of $LINE (which cannot conflict with user-defined token names) whose lexeme contains the entire line of input. This only works when the scanner is in line mode.
Here's an example of a PLCC language specification file whose implementation can process a PLCC specification file (but with no semantics)
skip WHITESPACE '\s*'
token PCT3$$ '^%%%' # toggles line mode
token ANYTHING_ELSE '\S*'
%
<start> ::= <stuff>
<stuff>:NoLineMode ::= <ANYTHING_ELSE>
<stuff>:LineMode ::= PCT3$$ <lines> PCT3$$
<lines> **= <>
%
# no semantics
The RHS element <> behaves as if it were <$LINE>, where $LINE is the special reserved token name for a line (which cannot be a user-defined token name). The scanner cannot return a $LINE "token" unless it's in line mode, which is toggled as described above.
In the above, once the scanner encounters the line mode toggle token (PCT3$$ in the above), it consumes the rest of the line containing the token and starts line mode processing beginning with the next input line. It then continues reading the input, line-by-line, until it encounters another instance of the same line mode toggle token, whereupon it returns to normal token processing.
Incidentally, the Scan, Parse, and Rep programs don't know anything about 'include' in the files they are reading. So if you wanted to run Scan on, say, the V6 language source files using the above language definition, you would need to do something like this:
(cd ~/PL/Code/V6 ; cat grammar code envVal prim val) |\
java -cp Java Scan
where the Java directory has the PLCC code generated by the above language. The 'cat ...' command will grab all of the code pieces (codpieces?) and present them to the scanner as a single file. Just running Scan on the 'grammar' file will not process the named include files, because the language described above doesn't know how.
--Tim
Stoney,I’d rather see the recursion depth be far deeper — perhaps 20. That way, if someone ever decides to try what I envisioned — building up a grammar using parts from earlier lessons — they would not run out of space.JimSent from my iPadOn Sep 30, 2023, at 4:53 PM, fosler @.***> wrote: Bowing to unrelenting pressure, I have succeeded in implementing an
'include' feature for input files to plcc.py. First, so as not to break
any existing code, the use of 'include ...' at the end (normally) of
the semantics section stays exactly the same: file names are simply
added to the argv array and processed as if they were parameters given
on the command line.
My proposed 'include' feature allows for lines of the form
#include filename
just like C/C++. When such a line appears anywhere in the input file
(after any command-line switches), input lines switch to the file with
the given filename, and returns to the previous file once the new file
contents have been read. These #include directives can be nested --
that is, an #include file can itself have an #include part, and
everything gets stacked up.
But BEWARE: if a file has an include like this:
#include fff
and if the file 'fff' itself has the same include line
#include fff
the include mechanism could blow up with a stack overflow. I have made
it so that you can't have nested includes more than 4 levels deep,
which avoids this problem. I can't imagine nesting even this much, but
I'm open to suggestions.
The tricky part about this is that there might possibly be a situation
where code in the semantics section between the %%% ... %%% markers has
were C/C++ -- an unlikely situation, but oddly possible given the
insatiable desire of both of you to target any implementation language
that is Turing complete. In order to side-step this possibility, I have
TURNED OFF the processing of #include directives for code between %%%
... %%% markers. I think this makes sense, and basically treats this
code as being entirely language independent (except for lines
themselves starting with %%% -- ouch!).
So whaddya think? I haven't propagated this change to the pithon.net
repository. I could share just the code for plcc.include.py for you to
test. Let me know...
Regards,
Tim
On Sun, 2023-09-24 at 07:39 -0700, Stoney Jackson wrote:
Would it make sense to add include statements to the lex and bnf
sections of grammar files?
I want them so that I can reuse lex and bnf specs with different
semantics definitions. For example
import v6.lex.plcc
%
import v6.bnf.plcc
%
(maximum depth semantics)
import v6.lex.plcc
%
import v6.bnf.plcc
%
(evaluation semantics)
Perhaps nicer might be to import the syntactic specification in a
single import:
import v6.syntax.plcc
%
(evaluation semantics)
This alternative means that v6.syntax.plcc contains both the lexical
and bnf definitions and the % separator between them. Assuming
included files can also contain includes, then v6.syntax.plcc could
even include v6.lex.plcc. This could allow for one to experiment with
different bnf rules that parse the same language without duplicating
the lexical definitions.
Also, I would only want this if it doesn't break existing code.
And if it's too difficult, then forget it. This falls under "nice-to-
have", but "not-essential".
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this
thread.Message ID: @.***>
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>
Sorry for my slow responses. Things are busy here.
Is there a reason to include a file more than once? Or should it ideally be disallowed? If disallowed, maybe use a hashtable to prevent a file from being included more than once.
It also occurs to me that this %%% ... %%% problem keeps popping up. I think another email thread solves this more generally. If so, we should get that in place first, and then this would no longer be a problem here, and would not need to be handled as a special case, true? Or maybe that's only true if we rebuilt PLCC using PLCC. I think my brain is turning inside out.
Regards, "Stoney" Herman Lee Jackson II (he, him, his) Professor of CS&IT Western New England University
Appointments (virtual or physical): https://stoneyjackson.youcanbook.me/ Phone: 413-782-1314 Office: Herman 207b
From: James Heliotis @.> Sent: Monday, October 2, 2023 11:01 To: ourPLCC/plcc @.> Cc: Herman L. Jackson @.>; Tim Fossum @.> Subject: Re: [ourPLCC/plcc] Add include to lex and bnf sections? (Issue #55)
The External Email below originated from outside the University. Unless you recognize the sender, do not click links, open attachments, or respond.
Stoney,
I’d rather see the recursion depth be far deeper — perhaps 20. That way, if someone ever decides to try what I envisioned — building up a grammar using parts from earlier lessons — they would not run out of space.
Jim
Sent from my iPad
On Sep 30, 2023, at 4:53 PM, fosler @.***> wrote:
Bowing to unrelenting pressure, I have succeeded in implementing an 'include' feature for input files to plcc.py. First, so as not to break any existing code, the use of 'include ...' at the end (normally) of the semantics section stays exactly the same: file names are simply added to the argv array and processed as if they were parameters given on the command line.
My proposed 'include' feature allows for lines of the form
just like C/C++. When such a line appears anywhere in the input file (after any command-line switches), input lines switch to the file with the given filename, and returns to the previous file once the new file contents have been read. These #include directives can be nested -- that is, an #include file can itself have an #include part, and everything gets stacked up.
But BEWARE: if a file has an include like this:
and if the file 'fff' itself has the same include line
the include mechanism could blow up with a stack overflow. I have made it so that you can't have nested includes more than 4 levels deep, which avoids this problem. I can't imagine nesting even this much, but I'm open to suggestions.
The tricky part about this is that there might possibly be a situation where code in the semantics section between the %%% ... %%% markers has
were C/C++ -- an unlikely situation, but oddly possible given the insatiable desire of both of you to target any implementation language that is Turing complete. In order to side-step this possibility, I have TURNED OFF the processing of #include directives for code between %%% ... %%% markers. I think this makes sense, and basically treats this code as being entirely language independent (except for lines themselves starting with %%% -- ouch!).
So whaddya think? I haven't propagated this change to the pithon.net repository. I could share just the code for plcc.include.py for you to test. Let me know...
Regards, Tim
On Sun, 2023-09-24 at 07:39 -0700, Stoney Jackson wrote:
Would it make sense to add include statements to the lex and bnf sections of grammar files? I want them so that I can reuse lex and bnf specs with different semantics definitions. For example import v6.lex.plcc % import v6.bnf.plcc % (maximum depth semantics) import v6.lex.plcc % import v6.bnf.plcc % (evaluation semantics) Perhaps nicer might be to import the syntactic specification in a single import: import v6.syntax.plcc % (evaluation semantics) This alternative means that v6.syntax.plcc contains both the lexical and bnf definitions and the % separator between them. Assuming included files can also contain includes, then v6.syntax.plcc could even include v6.lex.plcc. This could allow for one to experiment with different bnf rules that parse the same language without duplicating the lexical definitions. Also, I would only want this if it doesn't break existing code. And if it's too difficult, then forget it. This falls under "nice-to- have", but "not-essential". — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>
— Reply to this email directly, view it on GitHubhttps://github.com/ourPLCC/plcc/issues/55#issuecomment-1741857112, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEH53GRKKHY6QCYE44PUYLDX5CBCLANCNFSM6AAAAAA5E7N2F4. You are receiving this because you commented.Message ID: @.***>
As I just noted on #49 , it looks like the preprocessor (PP) option was added to solve this problem. If it does, maybe we should just document the PP feature and how to use it to include add "#include".
AGREED Allow #include
to grammar files.
:tada: This issue has been resolved in version 4.0.0 :tada:
The release is available on GitHub release
Your semantic-release bot :package::rocket:
Would it make sense to add
include
statements to the lex and bnf sections of grammar files?I want them so that I can reuse lex and bnf specs with different semantics definitions. For example
Perhaps nicer might be to import the syntactic specification in a single import:
This alternative means that v6.syntax.plcc contains both the lexical and bnf definitions and the % separator between them. Assuming included files can also contain includes, then v6.syntax.plcc could even include v6.lex.plcc. This could allow for one to experiment with different bnf rules that parse the same language without duplicating the lexical definitions.
Also, I would only want this if it doesn't break existing code.
And if it's too difficult, then forget it. This falls under "nice-to-have", but "not-essential".