Specifying different codepages for Cobolfiles in CobolPreprocessorImpl

The current implementation of the CobolPreprocessor seems to allow default (utf-8) character encoding of the cobol-files, only. Typically the cobol sources files are SingleByteCharSets only, like ebcdic (ibm-1441) or iso-8859-1(5), win-1252 only and would thus need a code page conversion before running through the parser.

I would appreciate the chance of parameterizing the codepage for Cobolsources - thus being able to use an additional parameter for the Charset in the InputStreamReader ....

public InputStreamReader(InputStream in,
                 **Charset cs)**

referring to your method:

    public String process(final File cobolFile, final List<File> copyFiles, final CobolSourceFormatEnum format,
            final CobolDialect dialect) throws IOException {
        LOG.info("Preprocessing file {}.", cobolFile.getName());

        final InputStream inputStream = new FileInputStream(cobolFile);
        final InputStreamReader inputStreamReader = new InputStreamReader(inputStream);
        final BufferedReader bufferedInputStreamReader = new BufferedReader(inputStreamReader);
        final StringBuffer outputBuffer = new StringBuffer();

Changing only the Codepage of the cobolsources, say from iso-8859 to utf-8 will lead to processing Errors sooner or later. Just think of 01 Name Pic x(30) value "Günter Mörgän" and statements like if Name(2:1) = "ü" etc etc ... These will only work in SBCS and not in DBCS. on the other hand, not converting the Codepage and thus letting the Parser Interpret the characters as utf-8 will lead to these well known grotesque and misinterpreted characters ...

uwol / proleap-cobol-parser

Specifying different codepages for Cobolfiles in CobolPreprocessorImpl #23