uniVocity-parsers is a suite of extremely fast and reliable parsers for Java. It provides a consistent interface for handling different file formats, and a solid framework for the development of new parsers.
915
stars
251
forks
source link
Lookahead not working as expected for last line without newline character #453
Use-case:
I'm trying to use the lookahead pattern to segregate the different datasets we receive in a fixed width file. For a specific dataset the lookahead pattern I use and the length of the entire row is exactly the same and I see some unexpected behaviour in this scenario.
More details on the code and expected vs actual outputs can be found below,
Code Block (Could be used to Replicate the isssue)
Note: Simplified our use-case to just explain the core of the issue we see.
fun testLookAhead(contents: String) {
var parserSettings = FixedWidthParserSettings()
parserSettings.format.padding = ' '
parserSettings.format.setLineSeparator("\n")
var deleteFields = FixedWidthFields(1,4,3,2,4,4,2)
var createFields = FixedWidthFields(1,4,3,2,4,4,2, 27)
parserSettings.addFormatForLookahead("2?????????????????01", deleteFields)
parserSettings.addFormatForLookahead("2?????????????????02", createFields)
var parser = FixedWidthParser(parserSettings)
(parser.parseAll(StringReader(contents))).forEach { println(Arrays.toString(it)) }
}
Scenario-1: (Getting expected output)
When contents passed is
20123003020761012301
20123003020769012301
20123002010394012302Some description comes here
Use-case: I'm trying to use the lookahead pattern to segregate the different datasets we receive in a fixed width file. For a specific dataset the lookahead pattern I use and the length of the entire row is exactly the same and I see some unexpected behaviour in this scenario.
More details on the code and expected vs actual outputs can be found below,
Code Block (Could be used to Replicate the isssue)
Note: Simplified our use-case to just explain the core of the issue we see.
Scenario-1: (Getting expected output) When
contents
passed isI get the expected output of,
Scenario-2: (Getting into exception) But when
contents
passed isThe expected output is,
But I get the below exception,
Scenario-3: (Getting expected output by adding a newline character to end of Scenario-2 input)
When
contents
passed isI get the expected output of,
I'm not sure if i'm missing some configuration or it's more of a bug. Looking for some help on this.