nerdocs / pydifact

A python EDIFACT library.
MIT License
154 stars 45 forks source link

Parser: enable configurable control characters and Parser reuse #58

Closed mj0nez closed 1 year ago

mj0nez commented 1 year ago

The German energy sector uses a EDIFACT subset for market communication (edi@energy), which implements a set of control characters differing from the current defaults (see the decimal separator on p. 49 Allgemeine Festlegungen zu den EDIFACT- und XML-Nachrichten).

Currently, an UNA segment is only mandatory if any control character differs from the defaults. Therefore, parsing an edi@energy interchange without a UNA-segment is possible and results in errors.

I suggest we allow the injection of a preconfigured Parser in the creation methods FileSourcableMixin.from_file and AbstractSegmentsContainer.from_str, as well as modify the Parser to respect the following order of control characters:

  1. UNA-segment of interchange
  2. characters passed during call-time of parse
  3. preconfigured character set
  4. pydifact defaults

This would have different benefits:

I try to draft a PR for the latter one as soon as possible, but as this would be a new feature, I think we should keep it separately, for now. :)

nerdoc commented 1 year ago

Yes, it's always a shame that there are established standards, and companies just ignore them and cook their own soup. I don't have time to implement this, but it seems to be a requirement on your side, so if you want to implement this, it surely would be a helpful enhancement!

mj0nez commented 1 year ago

Done: https://github.com/nerdocs/pydifact/pull/59. Conflicts are resolved.

nerdoc commented 1 year ago

Seems that merging the github-actions branch introduced a new conflict...

mj0nez commented 1 year ago

Fixed :)

nerdoc commented 1 year ago

Thanks for your great ideas and additions, merged.