Closed sharukhshaik126 closed 3 months ago
Hi @sharukhshaik126 , what is the character encoding of the data you are reading? You can use one of the overloads of EDIInputFactory#createEDIStreamReader
to provide the correct encoding for your input.
@MikeEdgar actually when i tried to parse edi file with accent marks it throw below exception : Unable to Stream EDI File : Error parsing input in segment NM1 at position 767, element 2 the same segment with accent marks working fine in windows env and it throws above exception in linux env, even if not specify char set encoding to edistreamreader class.
my code block :
EDIInputFactory inputFactory = EDIInputFactory.newFactory(); inputFactory.setProperty(EDIInputFactory.EDI_IGNORE_EXTRANEOUS_CHARACTERS, true); InputStream inputStream = new FileInputStream(sourceFile); EDIStreamReader ediReader = inputFactory.createEDIStreamReader(inputStream);
@sharukhshaik126 you'll need to provide the name of the character encoding when you create the EDIStreamReader
.
Something like this (I am only guessing on the encoding in this example):
EDIStreamReader ediReader = inputFactory.createEDIStreamReader(inputStream, "ISO-8859-1");
Let me try with different encoding "UTF-8" and ISO-8859-1 , Thanks @MikeEdgar
FYI that the default is UTF-8
if nothing is given.
Thanks , will try to define exact charset encoding to parse it. will update you here.
@sharukhshaik126 any luck?
@MikeEdgar No in linux still it throws exception after setting encoding to UTF-8 fail to parse edi file : /tmp/test/Halin_C_frdsw.txt | UTF-8 io.xlate.edi.stream.EDIStreamException: Error parsing input in segment NM1 at position 767, element 2 at io.xlate.edi.internal.stream.StaEDIStreamReader.lambda$executeTask$1(StaEDIStreamReader.java:186) at io.xlate.edi.internal.ThrowingRunnable.run(ThrowingRunnable.java:19) at io.xlate.edi.internal.stream.StaEDIStreamReader.executeTask(StaEDIStreamReader.java:181) at io.xlate.edi.internal.stream.StaEDIStreamReader.nextEvent(StaEDIStreamReader.java:212) at io.xlate.edi.internal.stream.StaEDIStreamReader.next(StaEDIStreamReader.java:241) at com.mage.edireader.EDIFileParser.main(EDIFileParser.java:79) Caused by: java.nio.charset.MalformedInputException: Input length = 1 at java.base/java.nio.charset.CoderResult.throwException(CoderResult.java:274) at io.xlate.edi.internal.stream.tokenization.Lexer.readCharacter(Lexer.java:339) at io.xlate.edi.internal.stream.tokenization.Lexer.readCharacterUnchecked(Lexer.java:313) at io.xlate.edi.internal.stream.tokenization.Lexer.parse(Lexer.java:192) at io.xlate.edi.internal.stream.tokenization.Lexer.parse(Lexer.java:174) at io.xlate.edi.internal.ThrowingRunnable.run(ThrowingRunnable.java:17) ... 4 more Exception in thread "main" io.xlate.edi.stream.EDIStreamException: Exception flushing output stream in segment NM1 at position 767, element 2 at io.xlate.edi.internal.stream.StaEDIStreamWriter.flush(StaEDIStreamWriter.java:240) at io.xlate.edi.internal.stream.StaEDIStreamWriter.close(StaEDIStreamWriter.java:230) at com.mage.edireader.EDIFileParser.main(EDIFileParser.java:187) Caused by: java.io.IOException: Stream Closed at java.base/java.io.FileOutputStream.writeBytes(Native Method) at java.base/java.io.FileOutputStream.write(FileOutputStream.java:354) at java.base/sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:233) at java.base/sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:312) at java.base/sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:316) at java.base/sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:153) at java.base/java.io.OutputStreamWriter.flush(OutputStreamWriter.java:251) at io.xlate.edi.internal.stream.StaEDIStreamWriter.flush(StaEDIStreamWriter.java:237) ... 2 more
Did you also try with ISO-8859-1
? As far as I can tell it does include í
and é
characters.
@sharukhshaik126 can you possibly provide a test file without sensitive data that I can use to reproduce the issue? Using the sample text you gave originally I haven't been able to trigger any errors.
@MikeEdgar after using charger encode ad ISO-8859-1 The EDI file parsed successfully
Great news! Thanks for the update @sharukhshaik126 . I'll go ahead and close the issue, but please re-open if this still isn't resolved in your opinion and we'll discuss further.
Describe the bug In my x12 EDI file I have NM1*IL segment which contains alphabets accents marks & it not parse element by EDIstreamreader class.
To Reproduce Parse any EDI x12 file with accent marks Eg:
NM1*IL*1*VíAK SéVAG*KIAZDEN****34*673459754~
Expected behavior Edistreamreader has to parse elements which has accent marks in both linux and windows env.
Additional context Add any other context about the problem here.