staedi api : EDIStreamReader fail to parse if segment element has accent marks . - Githubissues

xlate / staedi

StAEDI - Streaming API for EDI: Java library featuring a reader/parser, writer/generator, and validation

Apache License 2.0

123 stars 35 forks source link

staedi api : EDIStreamReader fail to parse if segment element has accent marks . #454

Closed sharukhshaik126 closed 3 months ago

sharukhshaik126 commented 4 months ago

Describe the bug In my x12 EDI file I have NM1*IL segment which contains alphabets accents marks & it not parse element by EDIstreamreader class.

To Reproduce Parse any EDI x12 file with accent marks Eg: NM1*IL*1*VíAK SéVAG*KIAZDEN****34*673459754~

Expected behavior Edistreamreader has to parse elements which has accent marks in both linux and windows env.

Additional context Add any other context about the problem here.

MikeEdgar commented 4 months ago

Hi @sharukhshaik126 , what is the character encoding of the data you are reading? You can use one of the overloads of EDIInputFactory#createEDIStreamReader to provide the correct encoding for your input.

sharukhshaik126 commented 4 months ago

@MikeEdgar actually when i tried to parse edi file with accent marks it throw below exception : Unable to Stream EDI File : Error parsing input in segment NM1 at position 767, element 2 the same segment with accent marks working fine in windows env and it throws above exception in linux env, even if not specify char set encoding to edistreamreader class.

my code block :

EDIInputFactory inputFactory = EDIInputFactory.newFactory(); inputFactory.setProperty(EDIInputFactory.EDI_IGNORE_EXTRANEOUS_CHARACTERS, true); InputStream inputStream = new FileInputStream(sourceFile); EDIStreamReader ediReader = inputFactory.createEDIStreamReader(inputStream);

MikeEdgar commented 4 months ago

@sharukhshaik126 you'll need to provide the name of the character encoding when you create the EDIStreamReader.

Something like this (I am only guessing on the encoding in this example):

EDIStreamReader ediReader = inputFactory.createEDIStreamReader(inputStream, "ISO-8859-1");

sharukhshaik126 commented 4 months ago

Let me try with different encoding "UTF-8" and ISO-8859-1 , Thanks @MikeEdgar

MikeEdgar commented 4 months ago

FYI that the default is UTF-8 if nothing is given.

sharukhshaik126 commented 4 months ago

Thanks , will try to define exact charset encoding to parse it. will update you here.

MikeEdgar commented 4 months ago

@sharukhshaik126 any luck?

sharukhshaik126 commented 4 months ago

@MikeEdgar No in linux still it throws exception after setting encoding to UTF-8 fail to parse edi file : /tmp/test/Halin_C_frdsw.txt | UTF-8 io.xlate.edi.stream.EDIStreamException: Error parsing input in segment NM1 at position 767, element 2 at io.xlate.edi.internal.stream.StaEDIStreamReader.lambda$executeTask$1(StaEDIStreamReader.java:186) at io.xlate.edi.internal.ThrowingRunnable.run(ThrowingRunnable.java:19) at io.xlate.edi.internal.stream.StaEDIStreamReader.executeTask(StaEDIStreamReader.java:181) at io.xlate.edi.internal.stream.StaEDIStreamReader.nextEvent(StaEDIStreamReader.java:212) at io.xlate.edi.internal.stream.StaEDIStreamReader.next(StaEDIStreamReader.java:241) at com.mage.edireader.EDIFileParser.main(EDIFileParser.java:79) Caused by: java.nio.charset.MalformedInputException: Input length = 1 at java.base/java.nio.charset.CoderResult.throwException(CoderResult.java:274) at io.xlate.edi.internal.stream.tokenization.Lexer.readCharacter(Lexer.java:339) at io.xlate.edi.internal.stream.tokenization.Lexer.readCharacterUnchecked(Lexer.java:313) at io.xlate.edi.internal.stream.tokenization.Lexer.parse(Lexer.java:192) at io.xlate.edi.internal.stream.tokenization.Lexer.parse(Lexer.java:174) at io.xlate.edi.internal.ThrowingRunnable.run(ThrowingRunnable.java:17) ... 4 more Exception in thread "main" io.xlate.edi.stream.EDIStreamException: Exception flushing output stream in segment NM1 at position 767, element 2 at io.xlate.edi.internal.stream.StaEDIStreamWriter.flush(StaEDIStreamWriter.java:240) at io.xlate.edi.internal.stream.StaEDIStreamWriter.close(StaEDIStreamWriter.java:230) at com.mage.edireader.EDIFileParser.main(EDIFileParser.java:187) Caused by: java.io.IOException: Stream Closed at java.base/java.io.FileOutputStream.writeBytes(Native Method) at java.base/java.io.FileOutputStream.write(FileOutputStream.java:354) at java.base/sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:233) at java.base/sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:312) at java.base/sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:316) at java.base/sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:153) at java.base/java.io.OutputStreamWriter.flush(OutputStreamWriter.java:251) at io.xlate.edi.internal.stream.StaEDIStreamWriter.flush(StaEDIStreamWriter.java:237) ... 2 more

MikeEdgar commented 4 months ago

Did you also try with ISO-8859-1 ? As far as I can tell it does include í and é characters.

MikeEdgar commented 4 months ago

@sharukhshaik126 can you possibly provide a test file without sensitive data that I can use to reproduce the issue? Using the sample text you gave originally I haven't been able to trigger any errors.

sharukhshaik126 commented 3 months ago

@MikeEdgar after using charger encode ad ISO-8859-1 The EDI file parsed successfully

MikeEdgar commented 3 months ago

Great news! Thanks for the update @sharukhshaik126 . I'll go ahead and close the issue, but please re-open if this still isn't resolved in your opinion and we'll discuss further.