smooks / smooks-edi-cartridge

Smooks EDI & EDIFACT cartridges for reading as well as writing EDI
https://www.smooks.org
Other
20 stars 14 forks source link

UnEdifactSpecificationReader fails to read C770 from D96A #90

Closed phax closed 3 years ago

phax commented 3 years ago

Hi Claude, there seems to be a small issue with the UnEdifactSpecificationReader implementation (checked with 2.0.0-M2).

The C770 definition from D96A looks like this:

----------------------------------------------------------------------

      C770  ARRAY CELL DETAILS

      Desc: To contain the data for a contiguous set of cells in an
            array.

      Note: The component 9424 - array cell information - occurs 100
            times in the composite.

010   9424   Array cell information                        C  an..35

----------------------------------------------------------------------

-> the information about element 9424 is missing.

Compared to D01B version, where the outcome is as expected.

----------------------------------------------------------------------

       C770 ARRAY CELL DETAILS

       Desc: To contain the data for a contiguous set of cells in
             an array.

010    9424  Array cell data description               C      an..512

       Note: 
             The composite C770 - array cell details - occurs
             9999 times in the segment. The use of the ARR
             segment is restricted to be used only with Version 3
             of ISO-9735.
             The component 9424 - array cell information - occurs
             100 times in the composite C770. The use of C770 is
             restricted to be used only with the ARR segment
             within Version 3 of ISO-9735.

----------------------------------------------------------------------

Is it easy for you to adopt the specification reader accordingly?

Thanks, Philip

phax commented 3 years ago

To be more precise: the error seems to occur in all versions prior to d99b. The first version I found a reference was D95A

cjmamo commented 3 years ago

Probably it's a bug in UnEdifactDefinitionReader's pattern matching. The EDIFACT directories are a pain to parse because they don't maintain a consistent structure across versions.

RovoMe commented 3 years ago

The issue with the current state of the definition reader is, that after reading the ID and description of the (complex) element it will stop on the next blank line. In the posted examples the directory for D01B contains the note below the actual component definitions while in the D96A case it is above and therefore contains a blank line that leads to the premature ending of the component.

While debugging different directory definitions I learned that moveToNextPart(reader) will read everything till the next entry separator is found. This can lead to situations where certain elements/components are missing completely as those lines are already consumed by the above mentioned method.