Parser: enable configurable control characters and Parser reuse

mj0nez commented 1 year ago

The German energy sector uses a EDIFACT subset for market communication (edi@energy), which implements a set of control characters differing from the current defaults (see the decimal separator on p. 49 Allgemeine Festlegungen zu den EDIFACT- und XML-Nachrichten).

Currently, an UNA segment is only mandatory if any control character differs from the defaults. Therefore, parsing an edi@energy interchange without a UNA-segment is possible and results in errors.

I suggest we allow the injection of a preconfigured Parser in the creation methods FileSourcableMixin.from_file and AbstractSegmentsContainer.from_str, as well as modify the Parser to respect the following order of control characters:

UNA-segment of interchange
characters passed during call-time of parse
preconfigured character set
pydifact defaults

This would have different benefits:

the above-mentioned modular control characters
reuse of Parser objects
as well as enabling new parsers, e.g.:
- While dealing with large interchanges like load profiles; a Parser which only pareses a subset of segments is especially useful, because converting all segments might take some time, but only a specific segment per message is actually required at this processing step.

I try to draft a PR for the latter one as soon as possible, but as this would be a new feature, I think we should keep it separately, for now. :)

nerdoc commented 1 year ago

Yes, it's always a shame that there are established standards, and companies just ignore them and cook their own soup. I don't have time to implement this, but it seems to be a requirement on your side, so if you want to implement this, it surely would be a helpful enhancement!

mj0nez commented 1 year ago

Done: https://github.com/nerdocs/pydifact/pull/59. Conflicts are resolved.

nerdoc commented 1 year ago

Seems that merging the github-actions branch introduced a new conflict...

mj0nez commented 1 year ago

Fixed :)

nerdoc commented 1 year ago

Thanks for your great ideas and additions, merged.

nerdocs / pydifact

Parser: enable configurable control characters and Parser reuse #58