Closed tenko closed 8 months ago
Hey, thanks for working on this! The code looks mostly fine to me as a prototype how this could be implemented. I have two remarks and one question.
if (name == "SYSTEM") { ... }
) by actually creating an instance of ModuleNode
and dropping that into the symbol table.STRING
would have affected these procedures. But I could imagine that it has something to do with ARRAY
and RECORD
values being passed by reference now. That means that they need an extra load to get their addresses.TYPE
. This will work without much effort for named types by simply adding another case to Sema::onQualifiedExpression
, i.e., without modification of the parser. My question is whether this is general enough or if anonymous types with a syntax such as the following are also required be supported: s := SIZE(ARRAY 10 OF RECORD x, y: INTEGER END);
? This syntax would require (significant) changes to the parser and potentially introduce unwanted ambiguity.I had a quick look at prototyping what I proposed in point no. 3 of my previous comment today. Unfortunately, even this solution requires a change to the interplay of Parser and Sema: There is an unfortunate ambiguity in the Oberon grammar due to (a) optional parentheses after procedure calls and (b) both actual parameters and type guards being placed in parentheses after an ident.
Currently, the parser does not have enough context to properly distinguish these different cases. This lack of context stems from the fact that, currently, all selectors of a designator are parsed without intermittent call-backs to Sema. As a result, the parser flies mostly blind and will parse an input such as SIZE(INTEGER)
as a type guard instead of an actual parameter. I will need to couple Parser and Sema much more closely and do a call-back after each selector to have the Parser correctly maneuver through these inputs. While this is going to take a bit of work, I think it'll make the design more consistent and, hopefully, the parsing more robust.
ADR
, GET
, PUT
and COPY
.
The wording in the Oberon-07 report for COPY
is "copy n consecutive words from src to dst".
I implemented this as a BYTE
copy as this makes most sense. I will check other implementations.
As of now this fails due to some changes in the CHAR
merge.
I traced this to LLVMIRBuilder::qualifiedName
and node->getIdentifier()
.
Probably related to the ProcedureNode::isImported()
change.
I will try further to find the error.SYSTEM
module will need to be able to change depending on whether the compiler is in Oberon, Oberon-2, or Oberon-07 mode. As you will have noticed, I already refactored all the built-in procedures into their own methods. This refactoring is intended as a first step towards moving this code out of LLVMIRBuilder
. The same will need to happen for code related to the SYSTEM
module.COPY
that fails? If so, have a look at void LLVMIRBuilder::visit(AssignmentNode &)
, where I handle assignment of records and arrays using a blanket LLVM memcpy
. Note that this code is still super sketchy. One issue is the naive size calculation, another is that, in case of array assignment, both the length and the content of the target array are overwritten, which can lead to weird behavior. My suggestion would be to write one dependable implementation for COPY
and then call that code for the handling of assignments, too.A question after looking at the code of your unit tests: does the compiler need to guarantee that the following assertions holds?
VAR a: ARRAY 10 OF INTEGER;
ASSERT(SYSTEM.ADR(a) = SYSTEM.ADR(a[0]));
This is true in C/C++, but without additional effort it will not be true in the current implementation of arrays in CodeGen: SYSTEM.ADR(a)
is the address of the length of the array, whereas the address of a[0]
is SYSTEM.ADR(a) + 8
. Recall that internally an array is a struct { i64, [ len x memTy ] }
.
Points no. 1 and 2 of my original list are actually linked. It is also related to your observation that aliasing the SYSTEM
module does not work as expected.
Your code uses SymbolTable::import()
to import a procedure from the SYSTEM
module. This method accepts the name (or alias) of a module, but works under the assumption that an instance of ModuleNode
(with that name) exists and has been set on the declaration. The following code from SymbolImporter
shows the intended use of the import()
method and also documents how aliasing is currently implemented.
auto decl = make_unique<ProcedureNode>(std::move(ident), dynamic_cast<ProcedureTypeNode *>(type));
symbols_->import(module->getAlias(), name, decl.get());
decl->setModule(module);
module->procedures().push_back(std::move(decl));
Using the ctor for ProcedureNode
shown above, the procedure will be marked automatically as imported as it is the constructor designed to be used for SymbolImporter
. This property then triggers a chain reaction leading to the problem you pointed out. Luckily, there's the quick fix to stop Sema from adding predefined procedures as external procedures, which seems to fix the two unit tests that you provided.
Making the aliasing work will, however, require a bit more work into the direction that I outlined in points no. 1 and 2. Nevertheless, I will merge your changes and my fixes into the main branch. While this functionality is not really production ready, it also doesn't interfere with the rest of the compiler.
If fixed the unit tests and adapted Parser and Sema to support type names as arguments to procedures (as qualified expressions). I also implemented SIZE
, which is available as built-in procedure (Oberon) and through the SYSTEM
module (Oberon 07). I uncommented the following line from the second unit test as it now compiles and works as expected.
SYSTEM.COPY(xadr, yadr, LEN(x) * SYSTEM.SIZE(INTEGER));
Hope this helps to add more functionality to the SYSTEM
module!
Excellent,
It now works as expected.
I will do some small adjustments after checking the Project Oberon source at link:
SYSTEM.ADR(a) = SYSTEM.ADR(a[0])
is supported.Note that my implementation of the SIZE
built-in procedure is rather naive as it simply returns the value calculated by Sema during checking. This is a logical size and does not take into consideration physical aspects such as alignment and padding. Probably this is better implemented by returning values obtained via llvm::DataLayout
, llvm::StructLayout
, etc.
Regarding point no. 2 in your comment: I'm thinking of slightly changing the handling of ARRAY
s in the backend (oh, no, not again!). Currently, I model them as a struct a: { int len, type vec[] }
and pass a pointer to the beginning of the struct, i.e., a
, around. The change I'm planning to do is to use a pointer to a.vec
as the handle for the array and access len
with a negative offset from that pointer. Recently, I finally figured out how to correctly compute that offset and the relative addresses of struct members. With this change, I believe I can get rid of a lot of special treatment required to skip the len
field and still have it there for the LEN
procedure. This is especially relevant as I'm planning to use the same approach to extend records with an additional (hidden) field to store the type tag (as record extension is the next thing on my list to add to the compiler).
Minimal implementation of the SYSTEM module described in the Oberon-07 language report.
(NOTE : This used to work OK, until the last changes with removal of the STRING type.)
SYSTEM.ADR, SYSTEM.GET, SYSTEM.PUT and SYSTEM.COPY currently implemented. SYSTEM.VAL I am not sure is possible to implement currently due to the TYPE argument expected. Same for the pervasive procedures MIN & MAX.
This is marked [DRAFT] as I suspect this could be integrated in a better way.