otterkit / otterkit-cobol

A free and open source Standard COBOL compiler for 64-bit environments
https://otterkit.com
Apache License 2.0
249 stars 15 forks source link

[Core Dev] External data items implementation #3

Closed KTSnowy closed 1 year ago

KTSnowy commented 2 years ago

Opening a new issue to discuss the implementation of external data items a little better.

Need to figure out how to resolve an external data item's storage.

KTSnowy commented 2 years ago

An how should this work if you have three programs which you compile independently and the first two uses a PIC X(10) and the last a PIC X(20) definition?

Expectation: the first program reaches the resolver which will then allocate 10 bytes, the second one gets a "local" field (or just its "storage") to the same place; and the last program gets an exception during CALL by the resolver.

Yeah, you're right, that won't work with only a reference. We need to check the data item's Length property so that the ExternalResolver would only return a reference to the storage if the Length of _EXT_TEST is equal to the Length of External._EXT_TEST, and throws an exception if it's not the same length. This length could be passed as an argument to the ExternalResolver.

ExternalResolver(External._EXT_TEST, 10):

So if the external EXT-TEST's memory has a length of 10 then it returns a reference to External._EXT_TEST. Otherwise it's an exception.

Note: feel free to create issues in this repo for discussions and ping me there, seems more useful than discussing in a commit ;-)

@GitMensch I opened a new issue to discuss this.

GitMensch commented 2 years ago

Correct approach "in general"; but something is still looking wrong:

ExternalResolver(External._EXT_TEST, "EXT-TEST"):

That would work if External._EXT_TEST is a field defined similar to working storage entries (just not with data allocated), so it also contains a size attribute the resolver can use.

The resolver would then check first in a resolver-local list "do I have storage attached to this name" and "if yes, is this the size of the passed field. If it is then it sets the field's storage directly to that; if there's no storage attached to this name it would create a new "external" entry with the name, size and data, then again sets the field's storage.

KTSnowy commented 2 years ago

This would have to go into the start-up code, resolving names and conflicts at runtime could affect performance, and boxing the data items into a List or array would also make lookup slower. I might try to figure out a more "static" approach that can be mostly set up at compile time, and only requiring a conflict check between runtime modules.

An independently compiled COBOL module can't run by itself, it needs a run unit executable. This executable could resolve external names, check for conflicts and set up the necessary namespaces at compile time, since it needs to be compiled anyways to be able to run it should not be possible to dynamically import new modules while it's running.

As far as I understand from the EXTERNAL clause definition in the standard, it doesn't go between run units, only between runtime modules inside of an independent run unit, and it needs to have the exact same definition and fields between the executable and all modules. Those modules would have to be linked to an executable at compile time. So, if I'm understanding how this works correctly, it should be possible to go through the executable and every module's External namespace at compile time, check if they all match both in size and type, and set up a common memory space for all of them through a generated constructor.

In pseudo-code this would look like:

using Module1;
using Module2;

Memory<byte> CommonSpace1 = new(20);
Memory<byte> CommonSpace2 = new(40);

Module1.External = new(CommonSpace1, CommonSpace2);
Module2.External = new(CommonSpace1, CommonSpace2);
this.External = new(CommonSpace1, CommonSpace2);
KTSnowy commented 2 years ago

The External constructor would have to be generated at compile time, and it would take Memory<byte> spaces as arguments to set up the same space for all external items. The pseudo code example above would set up two common memory spaces for two external data items that are shared between the executable and two modules.

KTSnowy commented 2 years ago

The generated constructor idea has a bunch of issues, but I might be able to figure something out that would still allow to determine all of this at compile time. Maybe each memory space in the constructor could already have the expected names associated to them, so each variable only needs to get their External._EXT_TEST to "connect" into and use the same memory area.

Maybe the External._EXT_TEST field would also have to be generated at compile time.

GitMensch commented 2 years ago

This would have to go into the start-up code, resolving names and conflicts at runtime could affect performance, and boxing the data items into a List or array would also make lookup slower.

This would only happen once per called module: the field is static and only the constructor of the module would execute the "resolve this external field for me, giving me storage" code. As only very few external items are defined the performance should be no issue at all (you may want to use use a hash storage instead of a linked list to resolve the name).

I might try to figure out a more "static" approach that can be mostly set up at compile time, and only requiring a conflict check between runtime modules.

A "common" COBOL application can easily consist of some thousand modules - you normally don't want to combine those as one fat binary, but use a library approach (possibly one library for each COBOL module, which is what cobc defaults to). It is also no problem if there are any conflicting sizes - as long as no module with a separate definition is actually activated in the same run unit.

KTSnowy commented 2 years ago

A "common" COBOL application can easily consist of some thousand modules

Yeah trying to check and initialize thousands of modules would be a pretty bad idea.

ExternalResolver(External._EXT_TEST, "EXT-TEST"):

So this format that you mentioned yesterday should work, right?

If the "EXT-TEST" external is not yet allocated, it allocates and returns a reference to it. If it's already allocated it checks the sizes of the definition and the allocated external storage. If it's the same it returns a reference to it, if it's not it throws an exception.

GitMensch commented 2 years ago

yes

KTSnowy commented 2 years ago

I'll try implementing this today and see how it goes. If I manage to make it work I'll ping you here again so that you can check the code for it.

KTSnowy commented 2 years ago

@GitMensch Alright, I managed to implement the external resolver, but I'm still having trouble understanding what is allowed with an external clause. For example, if it's just elementary items or whole group items as well.

KTSnowy commented 2 years ago

I figured it out, also the parser now supports it as well. Just have to write the codegen for it and it should work.

GitMensch commented 2 years ago

As a test you may define a group with only USAGE DISPLAY numeric items below and in another program using an elementary with plain PIC X of matching size and one last program with not matching size.

Then call all of these three from one main program, the first would explicit code an INITIALIZE num-group, the other two would just DISPLAY it.

KTSnowy commented 1 year ago

Screen Shot 2022-12-03 at 16 46 40

@GitMensch I ended up trying to extend the parser's error messages to allow for more custom messages, so that we can give a more detailed explanation of the issue, instead of the usual "Unexpected Token: Expected this, instead of that"

This allows us to guide the developer in the right direction and to understand what happened a little better. Might also work as a learning thing for new COBOL developers, they'll get detailed snippets on how to fix and how it should work in that particular place on the code.

Screen Shot 2022-12-03 at 17 40 25

GitMensch commented 1 year ago

Do you want to share the test code used and the sample code generation and execution for the suggested example?

KTSnowy commented 1 year ago

@GitMensch by test code do you mean the parser code that checks for these issues?

Here, the level-77 data item parsing starts here: https://github.com/otterkit/otterkit/blob/main/src/OtterkitAnalyzer.cs#L145

I don't have the codegen working yet for the EXTERNAL items, I'll get it working later today.

Also, here is the ErrorHandler that displays these errors: https://github.com/otterkit/otterkit/blob/main/src/OtterkitErrorHandling.cs

GitMensch commented 1 year ago

by test code do you mean the parser code that checks for these issues?

No, I've meant the codegen and execution results with those 4 programs as mentioned above. But "that needs to wait" is fine with me.

KTSnowy commented 1 year ago

I'll post those here when I get it working. Should be able to get it working later today or during the afternoon tomorrow.

The EXTERNAL clause should only work with 01-level data items, right?

GitMensch commented 1 year ago

The EXTERNAL clause should only work with 01-level data items, right?

Yes. And with files, but that can come later after file handling works well enough.

KTSnowy commented 1 year ago
       IDENTIFICATION DIVISION.
       PROGRAM-ID. HELLO-WORLD.

       DATA DIVISION.
       WORKING-STORAGE SECTION.
       01 WS-EXT-TEST PIC X(10) EXTERNAL.

       PROCEDURE DIVISION.
           ACCEPT WS-EXT-TEST FROM DATE YYYYMMDD.
           DISPLAY WS-EXT-TEST UPON STANDARD-OUTPUT.
           CALL "EXT-TEST-PROGRAM".
           CALL "EXT-TEST-PROGRAM-2".
           STOP RUN.
       END PROGRAM HELLO-WORLD.

       IDENTIFICATION DIVISION.
       PROGRAM-ID. EXT-TEST-PROGRAM.

       DATA DIVISION.
       WORKING-STORAGE SECTION.
       01 WS-EXT-TEST PIC 9(10) EXTERNAL.

       PROCEDURE DIVISION.
           DISPLAY WS-EXT-TEST UPON STANDARD-OUTPUT.
       END PROGRAM EXT-TEST-PROGRAM.

       IDENTIFICATION DIVISION.
       PROGRAM-ID. EXT-TEST-PROGRAM-2.

       DATA DIVISION.
       WORKING-STORAGE SECTION.
       01 WS-EXT-TEST PIC X(5) EXTERNAL.

       PROCEDURE DIVISION.
           DISPLAY WS-EXT-TEST UPON STANDARD-OUTPUT.
KTSnowy commented 1 year ago

The code above is the test code, not sure if it's exactly like you asked, please let me know.

This is the generated C# code: https://github.com/otterkit/otterkit/tree/main/src/.otterkit

And this is the output: Screen Shot 2022-12-04 at 12 07 28

Not sure if EC-EXTERNAL-FORMAT-CONFLICT is the correct exception to throw, but it seemed to be the closest option from the standard.

KTSnowy commented 1 year ago

@GitMensch let me know if something is not right or if it needs changes

KTSnowy commented 1 year ago

The CALL statement implementation right now is extremely basic, needs more work later to implement the rest of the features

GitMensch commented 1 year ago

I think the default value does not matter at all. My sample suggestion would be for the first:

       01 WS-EXT-TEST EXTERNAL.
          03 WS-EXT-YYYY PIC 9999.
          03 WS-EXT-MM  PIC 99.
          03 WS-EXT-DD  PIC 99.

and for the last use PIC X(15) [because NOT according to the standard but to some dialects a follow-up use may be smaller, but never bigger - the standard says "same size");

but otherwise - yes, that was what I've suggested. I think this issue can be closed.

The codegen looks at least reasonable.

KTSnowy commented 1 year ago

I think the default value does not matter at all.

The standard mentions that it can have a VALUE clause:

Within a run unit, if two or more source elements describe the same external data record, each name that is externalized to the operating environment for the record description entries shall be the same; the VALUE clause specification, if any, for each record name of the associated record description entries shall be identical;

So I was planning to use the default value field to check for the value in the VALUE clauses before returning the external memory space. This is in the 2022 standard, not sure if it was like that in the previous ones.

NOT according to the standard but to some dialects a follow-up use may be smaller, but never bigger - the standard says "same size"

I wanted to strictly follow the 2022 standard only for now, mostly because we don't have enough resources to support multiple dialects, and I didn't want to create another dialect with Otterkit. So it would be a Standard COBOL compiler, rather than it's own dialect. Not sure if it would be a good idea to deviate from the standard right now.

GitMensch commented 1 year ago

So I was planning to use the default value field to check for the value in the VALUE clauses before returning the external memory space. This is in the 2022 standard, not sure if it was like that in the previous ones.

I've missed that before, that's interesting and is also in the 2014 standard; it seems it isn't in COBOL85, but I only have "vendor" docs for that, the 85 standard and 2002 one (if you have a link for a draft of that, please share) is at my old employer.

KTSnowy commented 1 year ago

Sorry, unfortunately I don't have a draft for the 2002 one.

GitMensch commented 1 year ago

As I was in need to search for it because of program-prototypes:

2014 final committee draft: https://www.open-std.org/jtc1/sc22/open/ISO-IECJTC1-SC22_N4561_ISO_IEC_FCD_1989__Information_technol.pdf early working draft of that: https://www.open-std.org/JTC1/SC22/open/n4315.pdf

... still did not found the 2002 one but as the 2008 working draft does not have any notes on removed ENTRY statement and no change-bars on the prototypes I've concluded that the 2002 one must have both dropped the first and added the second.

KTSnowy commented 1 year ago

Hey @GitMensch the open-std links gave me an idea, so I opened the https://www.open-std.org/jtc1/sc22/open/ directory to look at all the files they had there.

I really wish they named those in a way that is easier to figure out what they are, but I ended up finding the 2002 standard there: https://www.open-std.org/jtc1/sc22/open/n3390.zip

For some reason they compressed it in that zip file.

KTSnowy commented 1 year ago

Is that the one you were looking for?

GitMensch commented 1 year ago

Is that the one you were looking for?

Jackpot, this is even the DIS, you can't get anything better without paying for it!

Note: In case you did not know, the DIS 2022 can be downloaded from https://isotc.iso.org/livelink/livelink?func=ll&objId=21916208

... and it shows that 2002 added prototypes, but has not removed the ENTRY statement; so it seems that this was even removed in COBOL85!

Thanks again!

BTW: you don't happen to have a public source for COBOL85, do you?

KTSnowy commented 1 year ago

Note: In case you did not know, the DIS 2022 can be downloaded from

Yep, I'm using the DIS 2022 from that link to work on Otterkit.

BTW: you don't happen to have a public source for COBOL85, do you?

I tried looking in the open-std files for it, but they only go back to 1997. So I don't think it's there. Was the COBOL standard already part of the ISO with COBOL85?

GitMensch commented 1 year ago

Was the COBOL standard already part of the ISO with COBOL85?

No, that was ANSI (I think to remember ISO took that over unchanged shortly after ANSI publishing it).