uniVocity / univocity-parsers

uniVocity-parsers is a suite of extremely fast and reliable parsers for Java. It provides a consistent interface for handling different file formats, and a solid framework for the development of new parsers.
917 stars 252 forks source link

FixedWidthParser behavior of .getMetadata().headers when parsed record is passed to another method #423

Closed AlejandroME closed 4 years ago

AlejandroME commented 4 years ago

Hello!

Consider this file (fixed width):

*FILEBEGIN
Record
Record
Record
*FILEEND
*FILEEND

We're using a lookahead for extracting/ignoring both begin/end pieces like this:

FixedWidthParserSettings settings = new FixedWidthParserSettings();
settings.setRecordEndsOnNewline(true);
settings.setSkipTrailingCharsUntilNewline(true);
settings.addFormatForLookahead("?FILEBEGIN", new FixedWidthFields(100));
settings.addFormatForLookahead("?FILEEND", new FixedWidthFields(100));

And we're parsing the file like this:

InputStream inputStreamForParser = new FileInputStream("filepath");
List<Record> records = new FixedWidthParser(settings).parseAllRecords(inputStreamForParser, StandardCharsets.UTF_8);

Now, since in our use case we need to also get track of the current line number of the processed record we're retrieving the list by index with an IntStream like this:

IntStream.range(0, records.size()).mapToObj(elm -> mapEventFields(records.get(elm), elm)).filter(Objects::nonNull);

mapEventFields is a private method in which we implement our logic. We're also returning null if the current record belongs to a header or footer, since those are meant to be ignored and are unparseable.

The logic goes like this:

private void mapEventFields(Record record, int currentLineNumber) {

        if (record.getMetaData().headers() != null){
            return null;
        }

        ...
    }

Our problem

Consider these examples:

Example 1:

records.stream().forEach(e -> System.out.println("Test1" + e.getMetaData().headers()));

Prints:

Test 1 null
Test 1 [Ljava.lang.String;@2b406b75
Test 1 [Ljava.lang.String;@34db2f0d
Test 1 [Ljava.lang.String;@7eb2cdfc
Test 1 null
Test 1 null

Example 2:

IntStream.range(0, records.size())
                    .mapToObj(elm -> records.get(elm).getMetaData().headers())
                    .forEach(l -> System.out.println("Test2 " + l));

Prints:

Test 2 null
Test 2 [Ljava.lang.String;@2b406b75
Test 2 [Ljava.lang.String;@34db2f0d
Test 2 [Ljava.lang.String;@7eb2cdfc
Test 2 null
Test 2 null

Example 3 (our current implementation):

IntStream.range(0, records.size())
                    .mapToObj(elm -> mapEventFields.get(elm), elm))
                    .forEach(l -> System.out.println("Test3 " + l));

Prints:

Test3 null
Test3 <Raw content of line 1>
Test3 [Ljava.lang.String;@2b406b75
Test3 <Raw content of line 2>
Test3 [Ljava.lang.String;@34db2f0d
Test3 <Raw content of line 3>
Test3 [Ljava.lang.String;@7eb2cdfc
Test3 <Raw content of line 4>
Test3 null
Test3 <Raw content of line 5>
Test3 null
Test3 <Raw content of line 6>

Why is the latter scenario happening? Is the record.get(i) passed to another method changing the object reference or doing some mutation on the headers? We expect the behavior found on examples 1 and 2.

This is basically breaking our implementation and we struggled a lot to find this behavior. Now we're clueless about why is this happening. We're on 2.9.0

Any light you shed on this will be greatly appreciated.

Thanks!

AlejandroME commented 4 years ago

Excuse me for this but I've found the issue :sweat_smile:

Apologies for bothering you.