Open tenko opened 5 months ago
The problem is in the title of the issue. O07.10.1 specifies that "[t]he result type of a procedure can be neither a record nor an array". Since a string literal is a value that is implicitly mapped to ARRAY OF CHAR
, this is and should not be supported by the backend. Since the procedure is created via OberonSystem.cpp
, the Sema checks that would stop this procedure from being defined are bypassed. Type STRING
should never be (explicitly) used in any declaration regardless of whether it is in source code (where it cannot be used) or internally in pseudo-modules. It only exists for Sema to type check string literals.
The best way forward in my opinion would be to model COMPILER.Env
in a way that passes and returns the string via a VAR
-parameter.
You are right with Oberon-07.
I see XDS the compiler allow return POINTER TO
construct.
Here it would be POINTER TO ARRAY OF CHAR
.
This is I believe allowed according to Oberon-2 report?
Edit: I see the language report MODULE Trees
example show return of POINTER TO
construct.
I will modify here this to take a VAR argument as suggested to be in line with Oberon-07.
Once there is a complete, tested, and stable implementation of Oberon-07, I think such extensions to the language definitely make sense. The fact that the base type of a pointer type can only be a record is currently only enforced in Sema. I don't think lifting that restriction will lead to problems or even changes in CodeGen as I added that Sema check quite late and already had code that made use of types like POINTER TO INTEGER
or POINTER TO ARRAY
.
I agree that modeling sting literals as POINTER TO ARRAY OF CHAR
(rather than the internal type STRING
) is a good solution as it is essentially how LLVM-IR represents these literals. But in order to add this extension, there first needs to be a specification what implicit conversions will happen and when a string is passed by value rather than reference. Consider the following four possible ways to define a variable that can be assigned a string literal.
VAR str1: ARRAY 10 OF CHAR;
str2: ARRAY OF CHAR;
str3: POINTER TO ARRAY OF CHAR;
str4: POINTER TO ARRAY 10 OF CHAR;
str2
and str3
or wether the syntax used for str2
is just syntactic sugar for the same effect as in the case of str3
? Open array variables will essentially boil down to pointers as the compiler cannot know how much space to allocate. str3
and str4
, or whether they are already subsumed by the second case? If they are not necessary, it probably makes sense to keep Wirth's restriction of limiting pointers to record base types. str1
but reference-based assignment (aliasing) for str2
, str3
, and str4
? str1 := str2
be legal? How about str1 := str3
or str1 := str4
? These could be implemented as syntactic sugar for str1 := ^str3
and str1 := ^str4
. Does going the other way, i.e.,str2 := str1
, make sense? Is it necessary or desirable to have this? Should the compiler rewrite this to str2 := SYSTEM.ADR(str1)
?I will re-open this issue and mark it as an enhancement for later down the line. I want to keep this on the radar as I believe elegant and efficient string handling to be a key factor to the ease-of-use and utility of a programming language.
Excellent.
I believe it makes sense to check how the XDS compiler behaves.
Oberon-2 does not have the assignment copy of arrays and rely on the COPY
procedure, so some problematics is avoided here.
I have implemented a dynamic string in String for the XDS compiler.
Here the string type is POINTER TO ARRAY OF CHAR
. Example of use in OSPath.
When using the standard string procedures taking and open array as argument the string pointer type is
dereferenced in the argument list. The compiler will give you an error message hinting if you forget this dereference.
I think this works out quite well in practice.
Also another open question is regarding strings is that many "modern" languages now implement the short string optimization trick, where the pointer and length data is reused for a static string. This needs to be on the compiler side as this is in no way type safe programming, but maybe this only makes sense for languages used for Web style programming where strings are everywhere?
This is not so easy to reproduce. I have a custom procedure defined which return a string literal.
From
OberonSystem.cpp
:And in
LLVMIRBuilder.cpp
:The following works until the code crashes on assignment in global scope.
Assignment in the
test
procedure works.