Open sgreenhill opened 4 years ago
Hello Stewart,
Have you heard anything about Ofront+, the extended translator of the Oberon languages into C?
• https://github.com/Oleg-N-Cher/OfrontPlus
Some of the suggestions you described here have already been implemented in Ofront+.
In my development strategy, I also faced the need to improve the interaction with C libraries. And as a basis, I took mainly those solutions that are implemented in BlackBox. Maybe my steps should be severely criticized, but I would be interested in your opinion about what is missing. And, of course, the VOC team can always look into my commits and sources.
So, I will list the most important differences from VOC.
New call models [stdcall] and [fastcall] have been added.
We can write a C library procedure without a body, and Ofront+ will generate a prototype. However, we do not need to include C headers on this library (i.e. #include
We can specify a custom name for a C function, as:
PROCEDURE [fastcall] PalAll* ["pal_all"] (data: ARRAY [notag] OF CHAR);
Implemented the [untagged] (and [notag] as a synonym) for structures and arrays that should not be tracked by GC. Here I think I did a better work than it was before.
Implemented VAR [nil] parameter (as in BlackBox) for passing one of: a variable or NIL.
Now there are even two ways to describe the prototype of a procedure:
PROCEDURE- At (x, y: SHORTINT);
--> import void Out6x8_At (SHORTINT x, SHORTINT y);
--> useful for using wrappers
PROCEDURE At (x, y: SHORTINT);
--> __EXTERN void At (SHORTINT x, SHORTINT y);
--> useful for direct work with DLL
Implemented MODULE tags:
[foreign] --> don't generate C body, only header [noinit] --> don't generate a module initializer function [main] --> analogue of option -m
This way, even your binding generator can be used to produce interfaces that work with Ofront+ after slightly modification.
I have plans for further improvements. For example, I would very much like to implement the [union] tag.
I didn't consider the option of a checkbox for linking libraries, but only because Ofront+ is not a compiler, it is a pure translator in C, and it doesn't directly call the C compiler and has nothing to do with its command-line options.
P.S. I know Norayr as a very conservative person who never likes to make new features, so perhaps you will have to code everything yourself.
Hello, everyone.
Thank you for the feedback.
Let me express my first reaction, which is probably predictable for @Oleg-N-Cher .
I do not like pragmas of any kind for Oberon. I believe if we start adding pragmas, we can go and go with it till the point we have more pragmas than language keywords. I even know fork of ooc, here, on github, which introduced, (surprise!) a new pragma. Then we'll have a problem why we discriminate this pragma, if we didn't discriminate others. (:
I believe, we should not complicate the parser with parsing OS dependent, compiler dependent expressions, and invent these expressions.
I think, currently voc, having Ofront, and OP2 heritage, evolved with using code procedures, that are used in Oberon operating systems, as, well, listings of machine code instructions, and I think that's a brilliant idea, to use already existing feature to enable linking to the foreign libraries.
@sgreenhill, you mentioned this notation ooc has:
MODULE X11 [ LINK LIB "X11" ];
I did not consider that a good design decision, though I understand, that might 'simplify' the workflow of developer.
FPC and Delphi Pascal has "unit", "program" and "library" keywords to describe how the unit should be compiled. In case of "library", it gets linked as dll in Windows. It seemingly simplifies the workflow, but, I believe, unnecessarily complicates the language, and makes it dependent on the operating systems, that have "executables" and "libraries".
I think voc's parser have been improved a lot, mostly thanks to @dcwbrown 's work, and I believe it can be used as a foundation of a 64bit native code compiler for Oberon operating system. But in that case, what to do with all that Linux or Windows specific code in parser, that we don't need(but may introduce by similar requests)? We, of course have now backend specific code in some modules, but those are mostly backend modules, that's OPM (M stands for machine) and OPC (C stands for code).
I think using makefiles to control the compilation is a good idea.
Also, one may prefer to use classic make
, other might prefer cmake
or something more fancy or extra ordinary.
To each task, its tool. The basic idea behind Unix, which is actually close to the Oberon OS design ideas as well: what they have in common is that designers of both systems decided to use modules, that are specialized on exact tasks, and combine the modules to achieve the solution of problem.
In Unix case some kind of IPC got used, in Oberon case, it's dynamic module loading, and function calls.
But it's not like the ideas are radically different - we know a way to solve big problems by dividing those in to smaller pieces and solving small pieces individually.
So in case of pragmas like MODULE X11 [ LINK LIB "X11" ];
, I believe, there is no need to complicate the parser while this problem can be solved outside of the compiler, by using your favourite build tool and compiler flags.
One day we may have compiler feature requests, to introduce new features. I believe today we have a luxury of situation, when Oberon, or voc is not actively used in industry, so there is no need to solve some problem, that developers have, by a solution, which might not be the best. We have a luxury to discuss the ideas, the features, without putting them to code, and having to introduce new stones in the building, that we later might realize, we build not the best way, but it might be too late to revert the changes, or to painful to do so.
That was my first reaction on the thought of complicating the parser. Still, I need to think, and any ideas are welcome, how can we improve the user experience, some other way, with a new tool, or may be with compiler flags, though my first reaction, again, would be, that even compiler flags that we have now are more than necessary. But new, separate tool, might be a better idea. I need to reread this issue again, and think. (:
Thank you again. I am glad you are keeping Oberon alive.
Thanks all for your comments. There are a few different issues here.
First issue: foreign code interface. John Donne says "No man is an island entire of itself", and clearly in a world where there are now so much great free software available it is important to provide a seamless interface to foreign code. This vastly increases what a developer can achieve with limited time resources. The designers of Component Pascal (who are also some of the Oberon-2 designers) realised this, and almost every point that @Oleg-N-Cher mentions is implemented in that system.
The facilities that are currently in VOC, inherited from the original Ofront are basic, but incomplete. The RECORD and ARRAY flags "[1]" protect the GC, but allow many other dangerous operations that could easily crash or corrupt a program. For example, applying LEN to a foreign open array. As it is, it can be hard to avoid doing many "unsafe" operations (eg. type casts via SYSTEM.VAL) in order to use C-implemented objects.
@Oleg-N-Cher mentions in point (2) the potential for compatibility problems during compilation. This arises because the "code procedure" idea requires a module compilation to include every include file required by the foreign code on which you depend. So on Mac OS, I am frequently getting this sort of warning:
In file included from /usr/local/include/SDL2/SDL_main.h:25:
/usr/local/include/SDL2/SDL_stdinc.h:63:11: warning: non-portable path to file
'<Strings.h>'; specified path differs in case from file name on disk
[-Wnonportable-include-path]
# include <strings.h>
^~~~~~~~~~~
<Strings.h>
/usr/local/include/SDL2/SDL_stdinc.h:84:11: warning: non-portable path to file
'<Math.h>'; specified path differs in case from file name on disk
[-Wnonportable-include-path]
# include <math.h>
^~~~~~~~
<Math.h>
This is because of the name confict between Ofront-generated headers (eg. "Math.h") and the headers required by the code procedures (eg. "math.h"). On both Mac OS and Windows it is possible for file-systems to be case-insensitive, so this immediately gives you a portability issue. I have been lucky so far, but only because Oberon tends to capitalise the first letter of the module name, and C uses almost entirely lower case. So the "C library procedure without a body" approach described by @Oleg-N-Cher can sometimes be significantly more robust. It simplifies the declarations, but importantly it saves developers from potential name collisions between Oberon modules and unrelated C headers. In such situations one is forced to either rename the Oberon module, or delve into the compiler and modify the naming scheme for intermediate header files. This should probably be done anyway as it is likely to cause a problem somewhere in the future. Renaming Oberon modules (eg. system modules) could require a cascade of edits to user and library code.
So doing the foreign interface properly means:
Second issue: @norayr, you mention the desire to keep the compiler simple, which I accept. Most Oberon users are probably here because they value simplicity. But many software projects involve complexity, and keeping the compiler too simple can push much complexity into the user's code. This has the effect of increasing overall complexity, because the problems are now duplicated countless times, and the individual solutions may be incompatible. For the developer, time is the critical resource, and most users won't accept a solution that requires them to jump through too many hoops.
The "system flag" approach is fairly simple, and does not have much impact on the language. In the OP2/ofront implementation, its pretty hard to decode what the different flags are meant to do, and in some cases they cram different values together (eg. the trailing gap for record alignment is also encoded in the sysflags).
All languages must adapt over time to new conditions, or risk dying out. For example, when multi-threading became common "C" introduced "volatile", and recently we have "atomic" in response to the development of SMP CPUs. "volatile" is also important for memory-mapped I/O which is now common on many devices. These things are all necessary to safely exploit modern hardware and operating systems. The fact that a feature is not in the original language does not mean that it should not be added. One of the advantages of the Ofront approach is that pretty much every required concept is already implemented in the C compiler, so something like:
volatile int seq;
can easily be declared without introducing keywords:
VAR seq [ volatile ] : LONGINT;
Third issue: makefiles. These might be acceptable for small projects, but when you have a few dozen modules, and are constantly updating makefiles to express module dependencies that are already expressed in the source code, you begin to ask why the compiler is not handling this task. Basic design principle: DRY ("don't repeat yourself"). Adding unnecessary code to a code-base increases the work required to maintain and extend the code. It introduces a potential failure point, since dependencies in the Makefile may become out-of-sync with the Oberon modules. The developer needs the confidence that the build process is correct, even when using modules that may have been written by other developers. So, for example:
oo2c --make Main
always compiles and links modules in the correct sequence, regardless of what you changed, just like:
javac Main.java
...or any other modern compiler. After all, the compiler already has all the knowledge required to correctly build the project, so why not use it?
Likewise, relying on Makefiles to maintain dependencies between Oberon and external libraries is IMO an unnecessary task that could easily be supported by the compiler. Otherwise, you have to understand every library dependency of every module that you are using, either directly or indirectly, and encode this in the Makefile. This breaks the concept of "blackbox reuse", which is important in any software ecosystem, because modules are not able to fully express their own dependencies.
Sorry, I hope this doesn't sound like a rant. Many thanks to you all for your great work.
It doesn't sound like a rant, it sounds very reasonable. FFI is something we might need to improve indeed, and, may be it can be used in the Oberon system, in case it runs on top of other OS.
Just a short note on C's atomic and volatile: I understand that there are emerging problems that C tries to address. One of those is how to use SMP efficiently. Another example of this is "memory fence" functionality added to the C++ language. Corresponding instructions have been added to Intel and several CPUs, so C++ keeps up with this CPU design.
I want just to share here, that I have a completely different idea on multitasking, and that is basically what Erlang vm does, but implemented in native code, and without having a shared memory model, but with messaging between processes,. The messages can be delivered locally or over the network. So the scaling to many machines is much easier, than with shared memory model. And the efficiency, in case of high load is higher. Joe Armstrong has an amazing video about that, called "How do we program multicores", I recommend it a lot. I think Oberon is well suited for that approach. I think we need to avoid shared memory for threads by any means. (:
Still, I will reread everything, and write more. You have a point, may be we cannot leave FFI like this. Still, I think if it is possible to use a separate tool for that. I agree that being an island is not a solution. And if we communicate with foreign libraries, we better do it wisely.
Hi Stewart, welcome back :-), just a side note about your h2o: there is some documentation available in the wiki of the Blackbox Framework Center Bernhard
yes, I have a feeling that H2O and voc can work nicely together to solve these ffi issues, but i need time to concentrate and think about it.
In my experience building and using H2O, the main issue is translating the "style" of the foreign API so that it maps well to Oberon. There are many possible mappings so each API will need some human intervention to define the mapping rules. Within these rules, the actual translation can usually be done mostly automatically. The rules may need to be periodically checked as APIs are updated, so it is useful to have users invested in this process. The more users, the more APIs that can be maintained.
Keeping this in mind, I think a sensible approach would be to follow @Oleg-N-Cher's implementation of the Component Pascal standard. That would make any API translation usable on at least three platforms: VOC, OFrontPlus, and Blackbox. Apart from minor syntactic differences, this also conforms to INTERFACE modules in OOC. There would of course still be the existing "code procedure" method, but this is essentially only usable on ofront-derived systems that translate to C.
This of course only applies to C-style libraries. Component Pascal actually went further, and implemented an object representation that was binary compatible with Microsoft's COM (Component Object Model). Basically, these are interface objects that use VTABLE dispatch (like in C++). COM objects support a form of introspection via type libraries, so it is possible to automatically handle remote method calls between processes (including over networks), and even dynamically build language bindings on the fly in some scripting environments such as Visual Basic.
C++ APIs are a more difficult problem. I had a few ideas about this, but not enough motivation to do much about it.
@btreut, Hi and thanks for the link. Good to see that software still exists in some useful form.
May I throw in a few general considerations?
@sgreenhill
POINTER TO ARRAY MAX(LONGINT) OF CHAR
should to the trick.
I think code procedures are brilliant although I don't know how to use quotations marks within them since a quotation mark denotes the end. To get the prototypes a tool like H2O might be helpful. I would refrain from changing the language that unsafe or missing C types like pointers without size or unions can be used without glue code. This seems introducing unsafety and making the first step into transforming Oberon into C with a different syntax. If Oberon offers something special it is simplicity and safety, both should not be sacrificed.
MODULE X11 [ LINK LIB "X11" ];
why not MODULE X11 (* LINK LIB "X11" *);
?
if we implement program
and library
then we loose the flexibility we have today, when we can link several modules in to one library.
i actually tinker with some code of dynamic module loading commandline shell, which does exactly that - the user types a command to compile a module, it executes standard voc, without any changes, produces .o
file, then links it as .so
shared library.
so this can be done with an external tool, without placing any extra functionality on compiler.
but still, what if I want to link several modules and distribute those as a library?
i think this discussion also is about how we define a programmer.
is a programmer a person, who can only use the compiler (and IDE) and doesn't care about anything outside the IDE interface? Indeed many programmers, especially those who used to tools as Delphi or Visual Studio, conform to that definition.
Also, I have a feeling that universities don't often include the courses about working environments. Thus, students may get knowledge of algorithms, but may not know anything about building tools, don't often know the build process.
The interesting example is - many C programmers I talked with don't know what is cpp
utility. Which is preprocessor, of course.
Usual Unix developers know at least one make
like utility, usually more, the make
itself, and something which the person favours. They know about -c
flag which gets them .o
files, and then they understand that those files have to be linked.
So my understanding is that developer should not be the person which doesn't want to see beyond its IDE window, but only implement beautiful abstractions.
And let me stress again, each tool to its need: Today I would not even mind, if voc was stripped down to only produce an object file, leaving programmer the task to link it. (though I was the one who first introduced the automation code in voc)
Or if we had a separate tool, which calls the compiler, gets the object file, and links those together. May be that tool could also build a dependency tree for the modules used, and be a replacement of the make
process.
In case of fpc
that is done by compiler itself.
I tried to put such kind of code in voc, but ended up believing this code does not belong to the compiler.
These are thoughts that are not directly related to all parts of this thread, and it does not relate to FFI part probably. But it expresses my feelings today, about the design of tools.
hello, people. i am sorry to not be very participative, i have hard time now: i have work, plus i have to move, and think of encapsulating devices and books in to boxes, plus when i get free i need to hear news from the department of defence briefing about the military situation, which is not very peaceful right now.
today I was able to re read everything, and understand better all the points.
I think, in general, the social processes are going in alignment with the demands of societies. That is why it is important to participate in discussions, to influence the formation of those demands. At the end of the day, Oberon community will get the compiler/framework they desire. Be that this project or the other. Therefore it is important to "be careful what you wish for".
So, I reread the texts, and by the way, POINTER TO ARRAY OF CHAR, which is translated to a struct, came to my attention as well, and I was thinking of documenting that somewhere.
I have mixed thoughts. One of those is, C interface is by definition unsafe. We may represent a mapping to a C struct, but that C struct would be different on other platform, or on the same platform after the upgrade. However hard we try, the C binding will be unreliable. Well, that's may be okay for some desktop platform we tested code on, and tell users to run it on, but I would not put that kind of code in to something serious. My understanding is that Ulm Oberon system did not have C interface on purpose, to not "contaminate" the programs with the C code. (:
But we do wrappers to external libraries, so what we can do is we can try to make those wrappers safe.
Becasue mappings to struct types were mentioned, one idea i would like to share with you is invented (invented, is that a right word?) by @dcwbrown when he was making his improvements. So Ofront has "struct stat" mapping in Unix.Mod
Status* = RECORD (* struct stat *)
dev*, devX*: LONGINT; (* 64 bit in Linux 2.2 *)
pad1: INTEGER;
ino*, mode*, nlink*, uid*, gid*: LONGINT;
rdev*, rdevX*: LONGINT; (* 64 bit in Linux 2.2 *)
pad2: INTEGER;
size*, blksize*, blocks*, atime*, unused1*, mtime*, unused2*, ctime*,
unused3*, unused4*, unused5*: LONGINT;
END ;
Instead of mapping each field to get an equivalent RECORD, @dcwbrown encapsulated the data like this
PROCEDURE -fstat(fd: LONGINT): INTEGER "fstat(fd, &s)";
PROCEDURE -stat(n: ARRAY OF CHAR): INTEGER "stat((char*)n, &s)";
PROCEDURE -structstats "struct stat s";
PROCEDURE -statdev(): LONGINT "(LONGINT)s.st_dev";
PROCEDURE -statino(): LONGINT "(LONGINT)s.st_ino";
PROCEDURE -statmtime(): LONGINT "(LONGINT)s.st_mtime";
PROCEDURE -statsize(): LONGINT "(ADDRESS)s.st_size";
this way we don't have to worry about paddings, alignment, and different order of fields on different platforms. As long as the struct has the field, our procedure will return its value.
and now I will wish you all the best, and come back later.
The choice to include only one module or several modules for creating a dynamic link library on UNIX is an example of the mismatch between Oberon and UNIX. A Dynamic link library is a mere collection of routines which can be loaded together. Templ Josef has implemented dynamic loading of modules in his latest version of ofront by generating a dynamic link library for each module. That way he can emulate the behaviour of the Oberon System and in fact he produced an Oberon System which behaves like the original version with respect to module loading and unloading.
Probably, the type LIBRARY is helpful on Windows.
Basically, the compiler should only do the minimum.
A note on complexity: Complexity is not size. Complexity is a function of interdependency of parts. To keep the complexity low it is a good strategy to have the parts to interact only by small bandwidth. Therefore it should not increase the complexity of the compiler if it generates a list of modules that have to be linked in order to resolve all symbolic references because that task gets only a simple input and produces only a simple output. Likewise the interpretation of flags within comments. The interaction is restricted only to parsing one comment and delivering a few boolean values.
Having cleanly defined, i.e. syntactically and contentswise, output makes it easy to combine tools. Since UNIX needs a linked program or references which dynamic link libraries have to be included having a seperate tool which generates the linking command is fine.
And there is nothing wrong to glue all that together with a script which executes an Oberon make to detect which modules have to be compiled, calls the compiler for each of these modules and finally calls the link step generator and links the program.
To make the user experience as good as possible the script and the tools should be provided.
Hi,
This is my first post here, so hello and thanks to all the developers of VOC. I have been trying VOC with a few C libraries and have a few suggestions for improvements. I may attempt to impelement some of these, so any comments or technical suggestions would be welcome.
A few system flags are already implemented in VOC.
Assigning a system flag
[1]
to anARRAY
causes it to be not copied when passed as a value parameter. This avoids making useless copies of strings which improves the performance of text I/O. This has the same effect as theNO_COPY
flag in OOC. As far as I can see this is only used in Files.WriteString, but there are probably many other places that could benefit.Assigning a system flag
[1]
to aRECORD
orPOINTER
causes the object to be untraced by the GC. That is, the pointer (or any pointers in theRECORD
are not recorded as heap pointers in the generated code and type descriptors. Clearly this is required for pointers to any objects allocated outside the Oberon heap. The GC assumes that it can use a record's type descriptor for marking the heap object, and for enumerating embedded pointers, so omitting this flag on a C-allocated object could corrupt the C heap and/or crash the garbage collector. This flag is I think equivalent to theUNTRACED
flag in Component Pascal.OOC defines some useful flags for its C interface:
http://ooc.sourceforge.net/OOCref/OOCref_16.html#SEC150
The important flags are:
1)
NO_DESCRIPTOR
declares that a record type has no type descriptor. This means it cannot be used in type tests and type guards, or the NEW procedure, and cannot be passed as a formal parameter that requires a type tag.2)
NO_LENGTH_INFO
declares that an open array type has no length information. This means that LEN cannot be used, and it cannot be passed as a formal parameter that requires a length.3)
UNION
declares that aRECORD
is to be translated to a C "union" instead of "struct". These don't occur very often but need to be implemented properly, especially if the variable is allocated by the (Oberon) client code.The current approach to calling external functions is a bit cumbersome, and requires one to hand-code a "macro" that will translate the Oberon call into a C call. For example:
Since this is a macro (or "code procedure"), it is necessary to explicitly cast between Oberon and C types in order to keep the compiler happy. Also, since the Oberon parameter list contains extra information (array length, type descriptors) the C and Oberon declarations don't always match (eg. the hidden title__len parameter in the above). If the types were properly declared (eg. with
NO_DESCRIPTOR
andNO_LENGTH_INFO
) then it should be possible to declare matching parameter lists, and simply call the functions via C externs. The need to explicitly code the type conversions makes it more difficult to automatically generate such interface declarations (eg. using a tool like H2O).One existing problem is the representation of open arrays. If you declare the following, as in oocC.string:
string = POINTER TO ARRAY OF CHAR;
The corresponding C code is:
With the current representation of the open array,
POINTER TO ARRAY
points to the length field rather than the data, so you can't simply cast (char *) as (oocC.string) without losing a few bytes of the array. At the moment, the only way to properly access a C array is to delcare an array with static length (eg. POINTER TO ARRAY 1 OF CHAR), and then disable run-time bounds checking. TheNO_LENGTH_INFO
flag would allow this to be done properly. With RECORDs this is less of a problem, as the type information is stored at a negative offset relative to the data.One last addition that would really help is to declare link libraries for C interface modules. For example, something like this, as can be done in OOC:
MODULE X11 [ LINK LIB "X11" ];
This means that whenever the X11 module is included the correct library dependency (
-lX11
) will be added to the link command. Currently it looks like the best option for specifying link libraries is via theCFLAGS
, but this adds messy dependencies into the makefiles. In some cases one must write a separate link step because gcc does not always allow link libraries to be declared before the object files that depend on them.I think this should be fairly easy to implement. Currently, each module contains a complete list of all modules that are directly or indirectly imported. The linker uses OPT.Links to enumerate the required object files, and this list is extended via OPT.InLinks for every IMPORT statement, and is saved via OPT.OutLinks when the symbol file is created. Link libraries could be added to this list (maybe with a special flag for the linker), or a separate parallel list could be maintained.
Any comments or suggestions?