Many tests fail on Windows, apparently due to line ending issues

nmusatti commented 4 years ago

I cloned this project on a PC running Windows 10 and proceded to execute mvn package with

Apache Maven 3.6.3 (cecedd343002696d0abb50b32b541b8a6ba2883f)
Maven home: C:\opt\apache-maven-3.6.3\bin\..
Java version: 1.8.0_121, vendor: Oracle Corporation, runtime: C:\Program Files\Java\jdk1.8.0_121\jre
Default locale: en_US, platform encoding: Cp1252
OS name: "windows 10", version: "10.0", arch: "amd64", family: "windows"

The result was:

[ERROR] Tests run: 823, Failures: 77, Errors: 0, Skipped: 2

Looking into the first failure, i.e. io.proleap.cobol.preprocessor.copy.cobolword.variable.CopyCblWrdTest I noticed that the expected string contains CR-LF sequences as line endings, while the actual string only LF. This happens because, while preprocessing replaces line endings with the LF character, FileUtils.readFileToString() does not and on Windows test resources contain Windows line endings.

A simple way to make tests pass would be to replace calls to readFileToString() with something like:

    final List<String> l = FileUtils.readLines(expectedFile, StandardCharsets.UTF_8);
    String expected = "";
    for ( String s: l ) {
        if ( expected != "" ) {
            expected += NEWLINE;
        }
        expected += s;
    }

However I wonder if this is the correct approach or if it wouldn't be better to make the preprocessor itself more platform independent. I only just started looking into this project and I don't have a clear idea of how things work.

Would you be interested in a pull request tackling Windows support? I ask because this project appears to have been quiescent for the last year and a half.

uwol commented 4 years ago

Hi Nicola,

thanks for the issue! I am developing under Mac OS and Linux, and so that error did not occur on my systems.

However, thanks to your issue in the last days I:

bumped dependencies from Java 8 to Java 11
bumped version of the parser from 3.0.0 to 4.0.0
removed Apache dependencies commons-io and commons-lang, so the code now has minimal dependencies
in the end could replace FileUtils.readFileToString and the problematic LF-based custom code in da139ecc4f12ac929d10e32d166da4a0e61898e8 with Files.readString, which was introduced in Java 11.

The new Files.readString hopefully should solve the problem. I will check this weekend on a Windows system, if the problem is solved. A short note from your side would be helpful, if it works for your now.

Thanks, Ulrich

nmusatti commented 4 years ago

Hi Ulrich, Thank you for addressing my issue so promptly. Unfortunately my problem is still there. However it appears to be easily fixable by modifying the NEWLINE constant in io.proleap.cobol.preprocessor.CobolPreprocessor from

final static String NEWLINE = "\n";

to

final static String NEWLINE = System.lineSeparator();

With this change all tests pass for me on both Windows and Linux.

uwol commented 4 years ago

Hi Nicola,

I tested mvn clean test on Windows 10 and ran into problems due to charset windows-1252. Therefore in 522630f4a438fb7a49cedb647044573fbd6fc878 I changed default charset to UTF-8. After this change on my Win 10 machine with JDK 11 and Maven 3.6.3 the test suite passes a mvn clean test with Tests run: 823, Failures: 0, Errors: 0, Skipped: 2 -> BUILD SUCCESS. The test suite also passes when running it in Eclipse on Windows 10.

I also thought about switching NEWLINE from \n to System.lineSeparator(). However, this would introduce platform-dependent behavior of the preprocessor, when it creates preprocessed COBOL code for the parser.

If the problem persists on your machine, please let me know. The next step would then be to identify the difference between our machines. If you have other questions regarding the parser, also let me know!

Ulrich

nmusatti commented 4 years ago

Hi Ulrich, It's your call obviously, but I'm not convinced that having the intermediate source platform dependent would be that bad. I assume that your main use case is to translate COBOL in something altogether different, however Antlr's TokenRewriteStream makes it very convenient to implement preprocessors. I used Antlr to replace embedded SQL calls with a proprietary implementation. I was actually looking for ways to "modernize" that project when I stumbled upon this one. In similar cases not preserving platform line endings might make it awkward for downstream tools. That said it's not overly important for me as Linux is going to be our main deployment platform. As soon as I have a moment I'll try and check which of my Windows machine settings may affect line endings.

Nicola

uwol / proleap-cobol-parser

Many tests fail on Windows, apparently due to line ending issues #78