ourPLCC / plcc

A Programming Languages Compiler Compiler
GNU General Public License v3.0
6 stars 4 forks source link

Add include to lex and bnf sections? #55

Closed StoneyJackson closed 11 months ago

StoneyJackson commented 1 year ago

Would it make sense to add include statements to the lex and bnf sections of grammar files?

I want them so that I can reuse lex and bnf specs with different semantics definitions. For example

import v6.lex.plcc
%
import v6.bnf.plcc
%
(maximum depth semantics)
import v6.lex.plcc
%
import v6.bnf.plcc
%
(evaluation semantics)

Perhaps nicer might be to import the syntactic specification in a single import:

import v6.syntax.plcc
%
(evaluation semantics)

This alternative means that v6.syntax.plcc contains both the lexical and bnf definitions and the % separator between them. Assuming included files can also contain includes, then v6.syntax.plcc could even include v6.lex.plcc. This could allow for one to experiment with different bnf rules that parse the same language without duplicating the lexical definitions.


Also, I would only want this if it doesn't break existing code.

And if it's too difficult, then forget it. This falls under "nice-to-have", but "not-essential".

jashelio commented 1 year ago

This is closely related to my student’s project a couple of years ago. Rather than arbitrary include directives, we saw grammar files as a whole as incremental. For example, V1 would include (depend on) V0, V2would include V1, etc.It is an idea worth pursuing, but it has drawbacks in terms of readability. I’d like to review our findings from that project. I do remember one thing: be careful about token definitions, because order matters.Jim

fosler commented 1 year ago

Bowing to unrelenting pressure, I have succeeded in implementing an 'include' feature for input files to plcc.py. First, so as not to break any existing code, the use of 'include ...' at the end (normally) of the semantics section stays exactly the same: file names are simply added to the argv array and processed as if they were parameters given on the command line.

My proposed 'include' feature allows for lines of the form

#include filename

just like C/C++. When such a line appears anywhere in the input file (after any command-line switches), input lines switch to the file with the given filename, and returns to the previous file once the new file contents have been read. These #include directives can be nested -- that is, an #include file can itself have an #include part, and everything gets stacked up.

But BEWARE: if a file has an include like this:

#include fff

and if the file 'fff' itself has the same include line

#include fff

the include mechanism could blow up with a stack overflow. I have made it so that you can't have nested includes more than 4 levels deep, which avoids this problem. I can't imagine nesting even this much, but I'm open to suggestions.

The tricky part about this is that there might possibly be a situation where code in the semantics section between the %%% ... %%% markers has

include lines. This could happen, for example, if the target language

were C/C++ -- an unlikely situation, but oddly possible given the insatiable desire of both of you to target any implementation language that is Turing complete. In order to side-step this possibility, I have TURNED OFF the processing of #include directives for code between %%% ... %%% markers. I think this makes sense, and basically treats this code as being entirely language independent (except for lines themselves starting with %%% -- ouch!).

So whaddya think? I haven't propagated this change to the pithon.net repository. I could share just the code for plcc.include.py for you to test. Let me know...

Regards, Tim

fosler commented 1 year ago

On a related issue, do you want me to implement the "lineMode" feature as I proposed earlier? That is, if a token definition looks like this:

token PCT3$$ '^%%%'

then the scanner will enter line mode whenever it sees the PCT3$$ token on input, and will exit line mode when it sees another matching PCT3$$ token. The "$$" at the end of the token name is what I used to toggle line mode. There are other ways we could use token names to trigger line mode -- my choice of "$$" is only a suggestion.

In line mode, the appearance of <> on the RHS matches an entire LINE of input, and it returns a token with the special token name of $LINE (which cannot conflict with user-defined token names) whose lexeme contains the entire line of input. This only works when the scanner is in line mode.

Here's an example of a PLCC language specification file whose implementation can process a PLCC specification file (but with no semantics)

skip WHITESPACE '\s*'
token PCT3$$ '^%%%'         # toggles line mode
token ANYTHING_ELSE '\S*'
%
<start> ::= <stuff>
<stuff>:NoLineMode  ::= <ANYTHING_ELSE>
<stuff>:LineMode    ::= PCT3$$ <lines> PCT3$$
<lines>             **= <>
%
# no semantics

The RHS element <> behaves as if it were <$LINE>, where $LINE is the special reserved token name for a line (which cannot be a user-defined token name). The scanner cannot return a $LINE "token" unless it's in line mode, which is toggled as described above.

In the above, once the scanner encounters the line mode toggle token (PCT3$$ in the above), it consumes the rest of the line containing the token and starts line mode processing beginning with the next input line. It then continues reading the input, line-by-line, until it encounters another instance of the same line mode toggle token, whereupon it returns to normal token processing.

Making these changes to the PLCC tool set requires modifications to plcc.py, Std/Scan.java, and Std/Token.java. These changes do not alter the *behavior* of the Java implementation produced by the PLCC tool set using the language examples in the Code repository, but of course the resulting Java files Scan.java and Token.java will not look quite the same. All of the other generated Java files -- namely, those generated from the BNF grammar specification -- are unchanged.

Incidentally, the Scan, Parse, and Rep programs don't know anything about 'include' in the files they are reading. So if you wanted to run Scan on, say, the V6 language source files using the above language definition, you would need to do something like this:

(cd ~/PL/Code/V6 ; cat grammar code envVal prim val) |\
java -cp Java Scan

where the Java directory has the PLCC code generated by the above language. The 'cat ...' command will grab all of the code pieces (codpieces?) and present them to the scanner as a single file. Just running Scan on the 'grammar' file will not process the named include files, because the language described above doesn't know how.

--Tim

jashelio commented 12 months ago

Stoney,I’d rather see the recursion depth be far deeper — perhaps 20. That way, if someone ever decides to try what I envisioned — building up a grammar using parts from earlier lessons — they would not run out of space.JimSent from my iPadOn Sep 30, 2023, at 4:53 PM, fosler @.***> wrote: Bowing to unrelenting pressure, I have succeeded in implementing an

'include' feature for input files to plcc.py. First, so as not to break

any existing code, the use of 'include ...' at the end (normally) of

the semantics section stays exactly the same: file names are simply

added to the argv array and processed as if they were parameters given

on the command line.

My proposed 'include' feature allows for lines of the form

#include filename

just like C/C++. When such a line appears anywhere in the input file

(after any command-line switches), input lines switch to the file with

the given filename, and returns to the previous file once the new file

contents have been read. These #include directives can be nested --

that is, an #include file can itself have an #include part, and

everything gets stacked up.

But BEWARE: if a file has an include like this:

#include fff

and if the file 'fff' itself has the same include line

#include fff

the include mechanism could blow up with a stack overflow. I have made

it so that you can't have nested includes more than 4 levels deep,

which avoids this problem. I can't imagine nesting even this much, but

I'm open to suggestions.

The tricky part about this is that there might possibly be a situation

where code in the semantics section between the %%% ... %%% markers has

include lines. This could happen, for example, if the target language

were C/C++ -- an unlikely situation, but oddly possible given the

insatiable desire of both of you to target any implementation language

that is Turing complete. In order to side-step this possibility, I have

TURNED OFF the processing of #include directives for code between %%%

... %%% markers. I think this makes sense, and basically treats this

code as being entirely language independent (except for lines

themselves starting with %%% -- ouch!).

So whaddya think? I haven't propagated this change to the pithon.net

repository. I could share just the code for plcc.include.py for you to

test. Let me know...

Regards,

Tim

On Sun, 2023-09-24 at 07:39 -0700, Stoney Jackson wrote:

Would it make sense to add include statements to the lex and bnf

sections of grammar files?

I want them so that I can reuse lex and bnf specs with different

semantics definitions. For example

import v6.lex.plcc

%

import v6.bnf.plcc

%

(maximum depth semantics)

import v6.lex.plcc

%

import v6.bnf.plcc

%

(evaluation semantics)

Perhaps nicer might be to import the syntactic specification in a

single import:

import v6.syntax.plcc

%

(evaluation semantics)

This alternative means that v6.syntax.plcc contains both the lexical

and bnf definitions and the % separator between them. Assuming

included files can also contain includes, then v6.syntax.plcc could

even include v6.lex.plcc. This could allow for one to experiment with

different bnf rules that parse the same language without duplicating

the lexical definitions.

Also, I would only want this if it doesn't break existing code.

And if it's too difficult, then forget it. This falls under "nice-to-

have", but "not-essential".

Reply to this email directly, view it on GitHub, or unsubscribe.

You are receiving this because you are subscribed to this

thread.Message ID: @.***>

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

jashelio commented 12 months ago

Sorry for my slow responses. Things are busy here.

Is there a reason to include a file more than once? Or should it ideally be disallowed? If disallowed, maybe use a hashtable to prevent a file from being included more than once.

It also occurs to me that this %%% ... %%% problem keeps popping up. I think another email thread solves this more generally. If so, we should get that in place first, and then this would no longer be a problem here, and would not need to be handled as a special case, true? Or maybe that's only true if we rebuilt PLCC using PLCC. I think my brain is turning inside out.

Regards, "Stoney" Herman Lee Jackson II (he, him, his) Professor of CS&IT Western New England University

Appointments (virtual or physical): https://stoneyjackson.youcanbook.me/ Phone: 413-782-1314 Office: Herman 207b


From: James Heliotis @.> Sent: Monday, October 2, 2023 11:01 To: ourPLCC/plcc @.> Cc: Herman L. Jackson @.>; Tim Fossum @.> Subject: Re: [ourPLCC/plcc] Add include to lex and bnf sections? (Issue #55)

The External Email below originated from outside the University. Unless you recognize the sender, do not click links, open attachments, or respond.

Stoney,

I’d rather see the recursion depth be far deeper — perhaps 20. That way, if someone ever decides to try what I envisioned — building up a grammar using parts from earlier lessons — they would not run out of space.

Jim

Sent from my iPad

On Sep 30, 2023, at 4:53 PM, fosler @.***> wrote:



Bowing to unrelenting pressure, I have succeeded in implementing an 'include' feature for input files to plcc.py. First, so as not to break any existing code, the use of 'include ...' at the end (normally) of the semantics section stays exactly the same: file names are simply added to the argv array and processed as if they were parameters given on the command line.

My proposed 'include' feature allows for lines of the form

include filename

just like C/C++. When such a line appears anywhere in the input file (after any command-line switches), input lines switch to the file with the given filename, and returns to the previous file once the new file contents have been read. These #include directives can be nested -- that is, an #include file can itself have an #include part, and everything gets stacked up.

But BEWARE: if a file has an include like this:

include fff

and if the file 'fff' itself has the same include line

include fff

the include mechanism could blow up with a stack overflow. I have made it so that you can't have nested includes more than 4 levels deep, which avoids this problem. I can't imagine nesting even this much, but I'm open to suggestions.

The tricky part about this is that there might possibly be a situation where code in the semantics section between the %%% ... %%% markers has

include lines. This could happen, for example, if the target language

were C/C++ -- an unlikely situation, but oddly possible given the insatiable desire of both of you to target any implementation language that is Turing complete. In order to side-step this possibility, I have TURNED OFF the processing of #include directives for code between %%% ... %%% markers. I think this makes sense, and basically treats this code as being entirely language independent (except for lines themselves starting with %%% -- ouch!).

So whaddya think? I haven't propagated this change to the pithon.net repository. I could share just the code for plcc.include.py for you to test. Let me know...

Regards, Tim

On Sun, 2023-09-24 at 07:39 -0700, Stoney Jackson wrote:

Would it make sense to add include statements to the lex and bnf sections of grammar files? I want them so that I can reuse lex and bnf specs with different semantics definitions. For example import v6.lex.plcc % import v6.bnf.plcc % (maximum depth semantics) import v6.lex.plcc % import v6.bnf.plcc % (evaluation semantics) Perhaps nicer might be to import the syntactic specification in a single import: import v6.syntax.plcc % (evaluation semantics) This alternative means that v6.syntax.plcc contains both the lexical and bnf definitions and the % separator between them. Assuming included files can also contain includes, then v6.syntax.plcc could even include v6.lex.plcc. This could allow for one to experiment with different bnf rules that parse the same language without duplicating the lexical definitions. Also, I would only want this if it doesn't break existing code. And if it's too difficult, then forget it. This falls under "nice-to- have", but "not-essential". — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>

— Reply to this email directly, view it on GitHubhttps://github.com/ourPLCC/plcc/issues/55#issuecomment-1741857112, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEH53GRKKHY6QCYE44PUYLDX5CBCLANCNFSM6AAAAAA5E7N2F4. You are receiving this because you commented.Message ID: @.***>

StoneyJackson commented 12 months ago

As I just noted on #49 , it looks like the preprocessor (PP) option was added to solve this problem. If it does, maybe we should just document the PP feature and how to use it to include add "#include".

StoneyJackson commented 11 months ago

49 doesn't help us here. The preprocessor is only applied to the generated Java code. It does not pre-process the grammar file. So it won't solve this issue.

StoneyJackson commented 11 months ago

AGREED Allow #include to grammar files.

github-actions[bot] commented 11 months ago

:tada: This issue has been resolved in version 4.0.0 :tada:

The release is available on GitHub release

Your semantic-release bot :package::rocket: