orhantoy / edifunct

Fun with EDIFACT :tada:
MIT License
3 stars 3 forks source link

Wrong UNA segment tokenization #4

Closed ghost closed 5 years ago

ghost commented 5 years ago

When the UNA segment is present, the tokenizer doesn't parse the segment tag properly. For example this segment : UNA;-:! " is tokenized as a segment with UNA; as segment tag.

The reason for this is because when extracting the tag from segment using the data_element_separator.

That applies to all segments except UNA because of their structure : Other segments structure : TAG + data_element_separator + .... UNA segment : UNA + component_data_element_separator + data_element_separator + ...

I wanted to fix this but there are things to consider before diving into this. First, I see that the UNA segment should just be ignored in the sense that it doesn't give much to the "message" content. It only serves the parsing process and should not be exposed to the end user.

Second, the parser gets stuck in a segment and goes through all possible tags from the schema until it finds the particular segment. The thing is that (I think) it would make sense to not expect a UNA segment from the schema (but it it's present then it should be fine)

Finally, continuing with #3 PR requires giving answers to this 😬 I saw that high level tests were failing because of the custom UNA. Why ? because the parser was stuck trying to find the UNA; segment from the schema.