otterkit / otterkit-cobol

A free and open source Standard COBOL compiler for 64-bit environments
https://otterkit.com
Apache License 2.0
249 stars 15 forks source link

[✨]: Include NIST85 tests - adjusted #6

Open GitMensch opened 1 year ago

GitMensch commented 1 year ago

I see that this compiler is mostly targeted at "modern standard COBOL", which I guess would be the 2022 standard. This means - in general - that many COBOL85 code would not be accepted - but it is likely reasonable to either "support enough" to compile and test NIST85 code - or at least do the following:

Either add "enough" COBOL85 like the comment paragraphs (AUTHOR. and friends) or remove them with sed or by hand. Ignore others like the ones using ALTER.

Aim for the NC module first, increasing your own testsuite with everything that does not compile and also check for the execution results in your testsuite. Compile and add the features of all program one by one. This will get you many failures on the first modules - but very reasonable ones, and much less with each follow-up.

And in the end this will get you many of the real world usages of COBOL and some special cases, too.

KTSnowy commented 1 year ago

Hey Simon, do you know which license the NIST test suit uses? Maybe we could add those into the compiler depending on which license it has.

GitMensch commented 1 year ago

I suggest to check out https://www.itl.nist.gov/div897/ctg/cobol_form.htm and possibly ask via mail.

KTSnowy commented 1 year ago

Managed to open the COBOL source file from the NIST test suite, and it's 300k lines of COBOL code. Is it meant to be that big?

Also there's an empty repo on this GitHub org called Otterly Testing, it's meant for a future COBOL unit test framework. Maybe we could start implementing that and add compiler tests for each implemented feature, including the stuff from the NIST test suite?

Also, I wasn't able to find the license file for the NIST test suite. That makes me worried about copyright issues.

GitMensch commented 1 year ago

Is it meant to be that big?

Yes, but these are multiple sources. The easiest way to unpack those is to let the GnuCOBOL build system do that once.

That makes me worried about copyright issues.

Asking the contact specified at that page seems most reasonable, explain what you want to do and ask under what license you can directly copy and redistribute it, then go on from there in a follow-up mail.

KTSnowy commented 1 year ago

Screen Shot 2022-12-04 at 18 05 03 Well, I wasn't able to send the email apparently. It doesn't exist anymore? That's weird.

KTSnowy commented 1 year ago

@GitMensch We might have to make a new one for COBOL 2022.

GitMensch commented 1 year ago

Well, I wasn't able to send the email apparently. It doesn't exist anymore? That's weird.

The wrong address is a webmaster issue - so send a note to do-webmaster@nist.gov pointing out that erroneous reference and ask it to be replaced by the right person to ask on the suite.

We might have to make a new one for COBOL 2022.

That would be really nice. Fair warning: the person that created the NIST85 was also working on all of the newer standards and said something like "that could be done, but would take several weeks of complete working days".

If you go for it, please ensure this suite can be run on a gnu system that runs GnuCOBOL, too. ;-)

You could - of course - take the GnuCOBOL testsuite and copy from there - as long as the license matches (testsuite GPL3+) and you leave the copyright in - have fun. The "run" tests don't check for compiler messages (other than expecting none) and only in very rare cases have a GnuCOBOL specific result. If you filter those that don't match the COBOL standard, than you'd have a very good base.

KTSnowy commented 1 year ago

That would be really nice. Fair warning: the person that created the NIST85 was also working on all of the newer standards and said something like "that could be done, but would take several weeks of complete working days".

Well, Otterkit will be a long-term thing, I'm not planning to ever abandon it, so we'll have a ton of time to work on a new test suite.

We could integrate that into the Otterly Testing unit test framework. But instead of calling only the Otterkit compiler, it could ask for any compiler and it's command line "build and run" options and test the standard output against a set of pre-written COBOL tests.

C# has a method for that, allows to call a program and redirect it's standard output into a variable. That means that the Otterly Testing framework could also work with GnuCOBOL, and probably any compiler that has a "build and run" command. The output redirect method is used internally in the Otterkit compiler to call the dotnet compiler and display its output together with Otterkit's output.

If you go for it, please ensure this suite can be run on a gnu system that runs GnuCOBOL, too. ;-)

We could work on it together if you'd like. Everyone would benefit from a general COBOL testing suite and unit test framework.

KTSnowy commented 1 year ago

@GitMensch let me know what you think, and if you'd like yo work on this as well.

GitMensch commented 1 year ago

I guess one can't run C# on some old linux distros; this would therefore make the test suite less useful. "autoconf generated testsuites work everywhere a shell exists".

Going for a full COBOL 2022 testsuite would likely mean to check each rule (first compile, then later result). to get an idea what this means have a look at https://sourceforge.net/p/gnucobol/code/HEAD/tree//branches/gnucobol-3.x/tests/testsuite.src/syn_redefines.at. To directly support multiple COBOL software one would have to either drop the check of the message, or would make this a list of valid outputs.

as noted: you are invited to use the GnuCOBOL testsuite as basis - and of course also to send PRs to improve it. This would be also useful for later testing of otter ;-)

For now I'd help on the GnuCOBOL side and also with inputs for a new suite where missing, I'm out of time for quite a while to create a new testsuite.

KTSnowy commented 1 year ago

I guess one can't run C# on some old linux distros; this would therefore make the test suite less useful.

We might be able to test that, C# also has a nativeAOT compilation mode that compiles C# ahead of time to native machine code with LLVM. This means that technically it would be similar to compiling C to native code with Clang. Only way to find out if it works with older distros would be to test the compiled binary.

as noted: you are invited to use the GnuCOBOL testsuite as basis - and of course also to send PRs to improve it.

We can't use GnuCOBOL's test suite though, due to license differences. It kinda goes one way, Otterkit uses the Apache 2.0 license which as far as I know should be compatible with GnuCOBOL's GPL license if you ever decided to use our test suite, BUT we can't use any of GnuCOBOL's test suite code without also having to change our license from Apache 2.0 to GPL, which is something that I can't do.

So the only option would be to write a new Apache 2.0 licensed COBOL 2022 test suite, that way not only will it be compatible with Otterkit and GnuCOBOL, but also any proprietary compiler that might decide to use it as well.

KTSnowy commented 1 year ago

To directly support multiple COBOL software one would have to either drop the check of the message, or would make this a list of valid outputs.

Adding some test suite specific output checks might work. For example, instead of directly outputting the result, the test suite would add an extra output "test hash" to make it easier to check.

For example, instead of directly displaying "+00234.23400" as part of some test code, it would display: "TEST-NUM543 +00234.23400 END-TEST".

"TEST-NUM543" meaning "Test Numeric 543" as a "test hash" which can be used to check if the output within is correct and can be used to document these test codes on a website with explanations later on.

KTSnowy commented 1 year ago

@GitMensch I know that you're busy working on GnuCOBOL and that you'd like me to use the tests there, but if we're going to make a new COBOL 2022 test suite to replace the old NIST85 one then we can't restrict it to GPL only, we need one that can be used freely no matter which license the compiler uses. The Apache license helps with that.

GitMensch commented 1 year ago

I'm quite sure the license view is not correct here. Anyone can test with a GPL-licensed test suite anything - but it can only be passed on with passing the possibly adjusted test source.

the test suite is not linked into anything so if we create a COBOL 2022 testsuite under GPL and MicroFocus or IBM wants to use those they are free to do so; the also would be able to extend it, only if they pass the testsuite on they have to pass the sources for the testsuite, too.

KTSnowy commented 1 year ago

if we create a COBOL 2022 testsuite under GPL and MicroFocus or IBM wants to use those they are free to do so; the also would be able to extend it, only if they pass the testsuite on they have to pass the sources for the testsuite, too.

Yes and that's kind of an issue as well. GnuCOBOL would be able to use it internally to test the compiler, but Otterkit and others won't.

If I understand correctly how the GPL license works, if I add a GPL-licensed test suite into the Otterkit repo and make a command-line option for it (so the compiler would directly call the test suite code, similar to how Otterkit statically calls Libotterkit internally) then we would have to GPL the compiler.

KTSnowy commented 1 year ago

@GitMensch But the opposite doesn't have the same issue, GnuCOBOL won't have to change to Apache 2.0 if you decide to statically link with the test suite

GitMensch commented 1 year ago

If we have that portable as scripts then no linking is necessary at all.

As long as we have a test runner binary, then there's again no need to link anything, because the compiler and result will be called via system.

KTSnowy commented 1 year ago

If we have that portable as scripts then no linking is necessary at all.

Well, I was thinking something a little more than just COBOL scripts, the Otterly Testing thing was meant to be a COBOL 2022 test suite and unit test framework.

Some COBOL syntax is implementor-defined, even if strictly following the standard. Like the device names for example. The scripts (or the test itself inside of the testing framework) needs to be slightly changed depending on which compiler you're testing it, it's not possible to correctly test device names with just a portable COBOL script file like the NIST test suite.

I'm sure that if the NIST test suite uses any implementor-defined features then it won't be able to correctly run on compilers that define it differently. Some of this might be inevitable, but things like device names could be changed with a config file and replaced by the testing framework before passing it to a compiler.

GitMensch commented 1 year ago

Let's put it that way: I'm keen to see some starting and would adjust in detail / COBOL questions, as well as testing the test suite with GnuCOBOL :-)

KTSnowy commented 1 year ago

Hey @GitMensch I couldn't find anything about this on the standard. How does COBOL determine the entry point of the program? Meaning, how does it choose the "main" program or class that should be executed first?

Does it just choose the first program that is defined in the source file?

GitMensch commented 1 year ago

The defined order (which makes some things harder than they otherwise would be) is to fall-through from top to down; so the first (not prototype) program (functions are only called) are executed first. If you have multiple source files in the compilation group the "start" entry point would be the very first program you find (but that's only relevant for a main program in any case). Specified extra ENTRY points are an extension since COBOL 2002 - because they aren't included there - as we just have seen.

KTSnowy commented 1 year ago

so the first (not prototype) program (functions are only called) are executed first

Alright so, it should be safe to let the user specify a start file and then the first program there is the entry point. Other files would then be included and compiled when the compiler encounters a call to a program from another file (libraries should include everything though).

Specified extra ENTRY points are an extension since COBOL 2002 - because they aren't included there - as we just have seen.

I had an alternative idea to allow users to specify a different entry point. We could provide an entry point "syntax" with the build command. Like this: entry.cob#ENTRY-PROGRAM. That way we can avoid adding extra non-standand extensions directly into the language.

KTSnowy commented 1 year ago

@GitMensch Working on the preprocessor right now, the most recent commit has the code for the >>SOURCE FORMAT directive. Reading this part of the standard made me realize that it should also be possible to add a new directive to specify the entry point.

The standard allows for implementor defined directives with >>IMP, so an >>IMP ENTRY-POINT directive should conform to the standard just fine.

vbcoen commented 1 year ago

The NIST sub system was written at the behest of U.S. Navy along with other tools for the testing of Cobol compilers almost all (if not all) of the other tools were dropped as technology moved onwards, which included a flowcharter creator that read a Cobol source file it was a right pain to read and could use up a lot of continuous stationary - as in a box worth or more.

Any subsequent updates to NIST has to be within the original copyright and was the equivalent of O/S. Therefore it is free to use for all developers of compiler and ancillary tools.

Yes, I was a holder of the complete toolset which was supplied on tape starting from the mid/late 60's (cannot remember the exact date) onwards.

KTSnowy commented 1 year ago

Hey @vbcoen, does this mean that we could update the NIST tests for COBOL 2023 and redistribute them here?

Any subsequent updates to NIST has to be within the original copyright and was the equivalent of O/S.

I wasn't able to find any license or copyright notice, so I wasn't sure which license the repo for the updated one should have. I'd love to update it to COBOL 2023 and include more tests for object oriented features.

Do you know if we would be allowed to redistribute it under the Apache-2.0 license with a NIST copyright notice?

GitMensch commented 1 year ago

The test suite is made public available on https://www.itl.nist.gov/div897/ctg/cobol_form.htm When strictly following what it contains it is "free for use, but not for distribution".

I did ask NIST once about the copyright and also the option to distribute it, the response was:

The COBOL suite is actually owned by the NCC. Please contact them at http://www.ncc.co.uk/

But this website of the National Computing Centre (NCC) [an UK organization] is down, so one should likely ask the NCC Group group. Could you try to do that (the most important thing would be to ask for the person responsible for the NIST85 testsuite)?

KTSnowy commented 1 year ago

Hey @GitMensch, I just sent them an email. I'll let you know what they said as soon as I receive an email back from them.

GitMensch commented 1 year ago

Any updates concerning the distribution possibility of an adjusted version?

What about using it locally (as it is done with GnuCOBOL) - at least checking some of the test groups possibly with an additional preprocessor that handles "obscure" things like line continuation?

KTSnowy commented 1 year ago

Any updates concerning the distribution possibility of an adjusted version?

I haven't received any replies from them yet, which makes me a bit worried about modifying and redistributing the test suite.

We don't know who actually owns the copyright right now (and can't contact them), and it doesn't have a license attached to it. So in the future if the copyright holder shows up and decides that they want to enforce the copyright in a proprietary way, we'd probably be in a lot of legal trouble.

Legally speaking we can't really do anything with the test suite (other than run it locally) without a license that explicitly gives us the rights to modify and redistribute.

GitMensch commented 1 year ago

Legally speaking we can't really do anything with the test suite (other than run it locally) without a license that explicitly gives us the rights to modify and redistribute.

Sure, that's the current state. So... What about running it locally (or at least try to handle the test source that you get when using make -C tests/cobol85 modules in GnuCOBOL)?