lboasso / oberonc

An Oberon-07 compiler for the JVM
MIT License
147 stars 17 forks source link

Undetected Syntax Error: Record Type's Final Field Declaration Ends with a Semicolon #30

Closed geekstakulus closed 11 months ago

geekstakulus commented 11 months ago

Introduction

In Oberon-07, semicolons serve as separators for both statements and field declarations within records. However, Niklaus Wirth's implementation diverged from the specified syntax. Specifically, the END keyword in record declarations should not be preceded by a semicolon.

Current Behavior

Currently, when fields in a record type are declared, and the last field is terminated with a semicolon, the compiler does not report a syntax error. For example:

MODULE Records;
  TYPE
    rec = RECORD
        x : INTEGER;
        y : CHAR;
    END;
END Records.

This code compiles without any problem, although it deviates from the expected behavior.

Expected Behavior

According to the Oberon-07 specification, the syntax for record declarations should adhere to the following pattern:

RecordType = "RECORD" ["(" BaseType ")"] [FieldListSequence] "END".
FieldListSequence = FieldList {";" FieldList}.
FieldList = IdentList ":" type.
IdentList = ident {"," ident}.

In this syntax, the semicolon functions as a separator, not a terminator. Consequently, the last field declaration should not conclude with a semicolon.

Repro Steps

To observe this behavior:

  1. Compile the code using the Oberonc compiler.
  2. Note that the compiler compiles the code without any error.

Other Information

Niklaus Wirth's implementation in the compiler code treats the semicolon as optional in this context. This behavior is not aligned with the Oberon-07 specification and originates from the original Oberon compiler. As a result, all compilers that rely on his code as a foundation fail to address this issue and accept it as valid. The only compiler that I know to adhere to the correct behavior is the OBNC compiler, implemented in C. The OBC compiler has a condition to handle this in the grammar. However, when you pass the -07 option, it fails to report the syntax error, and it compiles as if it were an Oberon-2 file.

Here's the output of the OBNC compiler:

obnc-compile: Records.Mod:6: syntax error, unexpected END, expecting IDENT
obnc: build process failed

Wirth's implementation in the code can be represented by the following EBNF rule:

RecordType = "RECORD" ["(" BaseType ")"] [FieldListSequence] [";"] "END".

Here, the semicolon is optional, mirroring the behavior implemented in the procedure RecordType.

This issue was identified by my TreeSitter parser, which adheres strictly to the specified grammar. The parser correctly flags this syntax as an error in record declarations. In contrast, the compiler handles it without issue. This mismatch highlights the discrepancy between the actual compiler code and the Oberon-07 specification, as well as the legacy nature of the codebase, that are nothing but a rewrite of his previous compilers. Namely: Algol-W, Pascal, Pascal-S, Modula and Modula-2.

Despite the challenges, working with legacy systems and updating them to modern standards can be a rewarding experience. This case serves as a reminder of how past limitations and workarounds can persist in code, even in systems designed by influential figures like Niklaus Wirth.

lboasso commented 11 months ago

Thanks for reporting this issue. Since oberonc parses a superset of the specification and the behavior is consistent with the optional ; in statement lists, I am inclined to keep the current behavior of the compiler. Like in other occasions where I have found inconsistency in the type system, more often than not Wirth agreed in updating the Oberon-07 report after an email exchange. I would imagine this is something that he would have fixed in the specification more than in the code.

geekstakulus commented 11 months ago

Coming from an inconsistent and lazy person such as Wirth, I don't doubt he would change the report instead. But, anyway, given that the majority of compilers use his code as a foundation, I believe this can be considered an anomaly in the report.

lboasso commented 11 months ago

I agree, closing