Closed KTSnowy closed 1 year ago
An how should this work if you have three programs which you compile independently and the first two uses a PIC X(10) and the last a PIC X(20) definition?
Expectation: the first program reaches the resolver which will then allocate 10 bytes, the second one gets a "local" field (or just its "storage") to the same place; and the last program gets an exception during CALL by the resolver.
Yeah, you're right, that won't work with only a reference. We need to check the data item's Length property so that the ExternalResolver would only return a reference to the storage if the Length of _EXT_TEST
is equal to the Length of External._EXT_TEST
, and throws an exception if it's not the same length. This length could be passed as an argument to the ExternalResolver.
ExternalResolver(External._EXT_TEST, 10):
So if the external EXT-TEST's memory has a length of 10 then it returns a reference to External._EXT_TEST
. Otherwise it's an exception.
Note: feel free to create issues in this repo for discussions and ping me there, seems more useful than discussing in a commit ;-)
@GitMensch I opened a new issue to discuss this.
Correct approach "in general"; but something is still looking wrong:
ExternalResolver(External._EXT_TEST, "EXT-TEST"):
That would work if External._EXT_TEST
is a field defined similar to working storage entries (just not with data allocated), so it also contains a size attribute the resolver can use.
The resolver would then check first in a resolver-local list "do I have storage attached to this name" and "if yes, is this the size of the passed field. If it is then it sets the field's storage directly to that; if there's no storage attached to this name it would create a new "external" entry with the name, size and data, then again sets the field's storage.
This would have to go into the start-up code, resolving names and conflicts at runtime could affect performance, and boxing the data items into a List or array would also make lookup slower. I might try to figure out a more "static" approach that can be mostly set up at compile time, and only requiring a conflict check between runtime modules.
An independently compiled COBOL module can't run by itself, it needs a run unit executable. This executable could resolve external names, check for conflicts and set up the necessary namespaces at compile time, since it needs to be compiled anyways to be able to run it should not be possible to dynamically import new modules while it's running.
As far as I understand from the EXTERNAL clause definition in the standard, it doesn't go between run units, only between runtime modules inside of an independent run unit, and it needs to have the exact same definition and fields between the executable and all modules. Those modules would have to be linked to an executable at compile time. So, if I'm understanding how this works correctly, it should be possible to go through the executable and every module's External namespace at compile time, check if they all match both in size and type, and set up a common memory space for all of them through a generated constructor.
In pseudo-code this would look like:
using Module1;
using Module2;
Memory<byte> CommonSpace1 = new(20);
Memory<byte> CommonSpace2 = new(40);
Module1.External = new(CommonSpace1, CommonSpace2);
Module2.External = new(CommonSpace1, CommonSpace2);
this.External = new(CommonSpace1, CommonSpace2);
The External constructor would have to be generated at compile time, and it would take Memory<byte>
spaces as arguments to set up the same space for all external items. The pseudo code example above would set up two common memory spaces for two external data items that are shared between the executable and two modules.
The generated constructor idea has a bunch of issues, but I might be able to figure something out that would still allow to determine all of this at compile time. Maybe each memory space in the constructor could already have the expected names associated to them, so each variable only needs to get their External._EXT_TEST
to "connect" into and use the same memory area.
Maybe the External._EXT_TEST
field would also have to be generated at compile time.
This would have to go into the start-up code, resolving names and conflicts at runtime could affect performance, and boxing the data items into a List or array would also make lookup slower.
This would only happen once per called module: the field is static and only the constructor of the module would execute the "resolve this external field for me, giving me storage" code. As only very few external items are defined the performance should be no issue at all (you may want to use use a hash storage instead of a linked list to resolve the name).
I might try to figure out a more "static" approach that can be mostly set up at compile time, and only requiring a conflict check between runtime modules.
A "common" COBOL application can easily consist of some thousand modules - you normally don't want to combine those as one fat binary, but use a library approach (possibly one library for each COBOL module, which is what cobc defaults to). It is also no problem if there are any conflicting sizes - as long as no module with a separate definition is actually activated in the same run unit.
A "common" COBOL application can easily consist of some thousand modules
Yeah trying to check and initialize thousands of modules would be a pretty bad idea.
ExternalResolver(External._EXT_TEST, "EXT-TEST"):
So this format that you mentioned yesterday should work, right?
If the "EXT-TEST" external is not yet allocated, it allocates and returns a reference to it. If it's already allocated it checks the sizes of the definition and the allocated external storage. If it's the same it returns a reference to it, if it's not it throws an exception.
yes
I'll try implementing this today and see how it goes. If I manage to make it work I'll ping you here again so that you can check the code for it.
@GitMensch Alright, I managed to implement the external resolver, but I'm still having trouble understanding what is allowed with an external clause. For example, if it's just elementary items or whole group items as well.
I figured it out, also the parser now supports it as well. Just have to write the codegen for it and it should work.
As a test you may define a group with only USAGE DISPLAY
numeric items below and in another program using an elementary with plain PIC X
of matching size and one last program with not matching size.
Then call all of these three from one main program, the first would explicit code an INITIALIZE num-group
, the other two would just DISPLAY
it.
@GitMensch I ended up trying to extend the parser's error messages to allow for more custom messages, so that we can give a more detailed explanation of the issue, instead of the usual "Unexpected Token: Expected this, instead of that"
This allows us to guide the developer in the right direction and to understand what happened a little better. Might also work as a learning thing for new COBOL developers, they'll get detailed snippets on how to fix and how it should work in that particular place on the code.
Do you want to share the test code used and the sample code generation and execution for the suggested example?
@GitMensch by test code do you mean the parser code that checks for these issues?
Here, the level-77 data item parsing starts here: https://github.com/otterkit/otterkit/blob/main/src/OtterkitAnalyzer.cs#L145
I don't have the codegen working yet for the EXTERNAL items, I'll get it working later today.
Also, here is the ErrorHandler that displays these errors: https://github.com/otterkit/otterkit/blob/main/src/OtterkitErrorHandling.cs
by test code do you mean the parser code that checks for these issues?
No, I've meant the codegen and execution results with those 4 programs as mentioned above. But "that needs to wait" is fine with me.
I'll post those here when I get it working. Should be able to get it working later today or during the afternoon tomorrow.
The EXTERNAL clause should only work with 01-level data items, right?
The EXTERNAL clause should only work with 01-level data items, right?
Yes. And with files, but that can come later after file handling works well enough.
IDENTIFICATION DIVISION.
PROGRAM-ID. HELLO-WORLD.
DATA DIVISION.
WORKING-STORAGE SECTION.
01 WS-EXT-TEST PIC X(10) EXTERNAL.
PROCEDURE DIVISION.
ACCEPT WS-EXT-TEST FROM DATE YYYYMMDD.
DISPLAY WS-EXT-TEST UPON STANDARD-OUTPUT.
CALL "EXT-TEST-PROGRAM".
CALL "EXT-TEST-PROGRAM-2".
STOP RUN.
END PROGRAM HELLO-WORLD.
IDENTIFICATION DIVISION.
PROGRAM-ID. EXT-TEST-PROGRAM.
DATA DIVISION.
WORKING-STORAGE SECTION.
01 WS-EXT-TEST PIC 9(10) EXTERNAL.
PROCEDURE DIVISION.
DISPLAY WS-EXT-TEST UPON STANDARD-OUTPUT.
END PROGRAM EXT-TEST-PROGRAM.
IDENTIFICATION DIVISION.
PROGRAM-ID. EXT-TEST-PROGRAM-2.
DATA DIVISION.
WORKING-STORAGE SECTION.
01 WS-EXT-TEST PIC X(5) EXTERNAL.
PROCEDURE DIVISION.
DISPLAY WS-EXT-TEST UPON STANDARD-OUTPUT.
The code above is the test code, not sure if it's exactly like you asked, please let me know.
This is the generated C# code: https://github.com/otterkit/otterkit/tree/main/src/.otterkit
And this is the output:
Not sure if EC-EXTERNAL-FORMAT-CONFLICT is the correct exception to throw, but it seemed to be the closest option from the standard.
@GitMensch let me know if something is not right or if it needs changes
The CALL statement implementation right now is extremely basic, needs more work later to implement the rest of the features
I think the default value does not matter at all. My sample suggestion would be for the first:
01 WS-EXT-TEST EXTERNAL.
03 WS-EXT-YYYY PIC 9999.
03 WS-EXT-MM PIC 99.
03 WS-EXT-DD PIC 99.
and for the last use PIC X(15)
[because NOT according to the standard but to some dialects a follow-up use may be smaller, but never bigger - the standard says "same size");
but otherwise - yes, that was what I've suggested. I think this issue can be closed.
The codegen looks at least reasonable.
I think the default value does not matter at all.
The standard mentions that it can have a VALUE
clause:
Within a run unit, if two or more source elements describe the same external data record, each name that is externalized to the operating environment for the record description entries shall be the same; the VALUE clause specification, if any, for each record name of the associated record description entries shall be identical;
So I was planning to use the default value field to check for the value in the VALUE
clauses before returning the external memory space. This is in the 2022 standard, not sure if it was like that in the previous ones.
NOT according to the standard but to some dialects a follow-up use may be smaller, but never bigger - the standard says "same size"
I wanted to strictly follow the 2022 standard only for now, mostly because we don't have enough resources to support multiple dialects, and I didn't want to create another dialect with Otterkit. So it would be a Standard COBOL compiler, rather than it's own dialect. Not sure if it would be a good idea to deviate from the standard right now.
So I was planning to use the default value field to check for the value in the
VALUE
clauses before returning the external memory space. This is in the 2022 standard, not sure if it was like that in the previous ones.
I've missed that before, that's interesting and is also in the 2014 standard; it seems it isn't in COBOL85, but I only have "vendor" docs for that, the 85 standard and 2002 one (if you have a link for a draft of that, please share) is at my old employer.
Sorry, unfortunately I don't have a draft for the 2002 one.
As I was in need to search for it because of program-prototypes:
2014 final committee draft: https://www.open-std.org/jtc1/sc22/open/ISO-IECJTC1-SC22_N4561_ISO_IEC_FCD_1989__Information_technol.pdf early working draft of that: https://www.open-std.org/JTC1/SC22/open/n4315.pdf
... still did not found the 2002 one but as the 2008 working draft does not have any notes on removed ENTRY
statement and no change-bars on the prototypes I've concluded that the 2002 one must have both dropped the first and added the second.
Hey @GitMensch the open-std links gave me an idea, so I opened the https://www.open-std.org/jtc1/sc22/open/ directory to look at all the files they had there.
I really wish they named those in a way that is easier to figure out what they are, but I ended up finding the 2002 standard there: https://www.open-std.org/jtc1/sc22/open/n3390.zip
For some reason they compressed it in that zip file.
Is that the one you were looking for?
Is that the one you were looking for?
Jackpot, this is even the DIS, you can't get anything better without paying for it!
Note: In case you did not know, the DIS 2022 can be downloaded from https://isotc.iso.org/livelink/livelink?func=ll&objId=21916208
... and it shows that 2002 added prototypes, but has not removed the ENTRY statement; so it seems that this was even removed in COBOL85!
Thanks again!
BTW: you don't happen to have a public source for COBOL85, do you?
Note: In case you did not know, the DIS 2022 can be downloaded from
Yep, I'm using the DIS 2022 from that link to work on Otterkit.
BTW: you don't happen to have a public source for COBOL85, do you?
I tried looking in the open-std files for it, but they only go back to 1997. So I don't think it's there. Was the COBOL standard already part of the ISO with COBOL85?
Was the COBOL standard already part of the ISO with COBOL85?
No, that was ANSI (I think to remember ISO took that over unchanged shortly after ANSI publishing it).
Opening a new issue to discuss the implementation of external data items a little better.
Need to figure out how to resolve an external data item's storage.