uniVocity-parsers is a suite of extremely fast and reliable parsers for Java. It provides a consistent interface for handling different file formats, and a solid framework for the development of new parsers.
917
stars
252
forks
source link
FixedWidthParser behavior of .getMetadata().headers when parsed record is passed to another method #423
We're using a lookahead for extracting/ignoring both begin/end pieces like this:
FixedWidthParserSettings settings = new FixedWidthParserSettings();
settings.setRecordEndsOnNewline(true);
settings.setSkipTrailingCharsUntilNewline(true);
settings.addFormatForLookahead("?FILEBEGIN", new FixedWidthFields(100));
settings.addFormatForLookahead("?FILEEND", new FixedWidthFields(100));
And we're parsing the file like this:
InputStream inputStreamForParser = new FileInputStream("filepath");
List<Record> records = new FixedWidthParser(settings).parseAllRecords(inputStreamForParser, StandardCharsets.UTF_8);
Now, since in our use case we need to also get track of the current line number of the processed record we're retrieving the list by index with an IntStream like this:
mapEventFields is a private method in which we implement our logic. We're also returning null if the current record belongs to a header or footer, since those are meant to be ignored and are unparseable.
The logic goes like this:
private void mapEventFields(Record record, int currentLineNumber) {
if (record.getMetaData().headers() != null){
return null;
}
...
}
Test3 null
Test3 <Raw content of line 1>
Test3 [Ljava.lang.String;@2b406b75
Test3 <Raw content of line 2>
Test3 [Ljava.lang.String;@34db2f0d
Test3 <Raw content of line 3>
Test3 [Ljava.lang.String;@7eb2cdfc
Test3 <Raw content of line 4>
Test3 null
Test3 <Raw content of line 5>
Test3 null
Test3 <Raw content of line 6>
Why is the latter scenario happening? Is the record.get(i) passed to another method changing the object reference or doing some mutation on the headers?
We expect the behavior found on examples 1 and 2.
This is basically breaking our implementation and we struggled a lot to find this behavior. Now we're clueless about why is this happening.
We're on 2.9.0
Any light you shed on this will be greatly appreciated.
Hello!
Consider this file (fixed width):
We're using a lookahead for extracting/ignoring both begin/end pieces like this:
And we're parsing the file like this:
Now, since in our use case we need to also get track of the current line number of the processed record we're retrieving the list by index with an
IntStream
like this:mapEventFields is a private method in which we implement our logic. We're also returning
null
if the current record belongs to a header or footer, since those are meant to be ignored and are unparseable.The logic goes like this:
Our problem
Consider these examples:
Example 1:
Prints:
Example 2:
Prints:
Example 3 (our current implementation):
Prints:
Why is the latter scenario happening? Is the
record.get(i)
passed to another method changing the object reference or doing some mutation on the headers? We expect the behavior found on examples 1 and 2.This is basically breaking our implementation and we struggled a lot to find this behavior. Now we're clueless about why is this happening. We're on 2.9.0
Any light you shed on this will be greatly appreciated.
Thanks!