Closed Magomogo closed 3 years ago
I'm aware of this limitation, however I'm currently doing a large refactor of the code base, including support for different character sets (UNOA, ONOB among others). You can find this work on the syntax-support
branch right now.
What character sets are you using?
I've tried both iso8859 and UTF.
I merged basic support for this on the development
branch. The parser now exposes an encoding(string)
method which accepts an UN/EDIFACT encoding string. Right now this can be any of UNOA, UNOB, UNOC and UNOY.
For latin script (ISO/IEC 8859-1), you'd need to do:
parser.encoding('UNOC');
To use unicode you need UNOY. The other latin scripts also require an encoding translation because they aren't unicode subset, which is what node internally uses. As such I didn't support those yet.
All this is included in the latest npm package. Feel free to test!
Great work, thanks!
Shouldn't this be automatically inferred from the UNB segment? (genuine question, I'm new to EDIFACT).
Yes it should. I was working on this, that's why I didn't close the issue yet. The parser starts in a special 'start of message' state which extracts the separators from the UNA segment if there is one. This is where this functionality should be added.
What's the status of this issue? Currently, version (1.2.8) fetched via npm install edifact
does not respect UNOC
defined in the UNB
segment while declaring parser.encoding("UNOC");
before parsing the document works. This unfortunately requires to know upfront what encoding the document defines which could be done with looking up the value via RegEx or other means, which might slow down the whole processing a bit.
@RovoMe I'm going to close this issue as of 356b69533d13355b6fbf2a3ebbeb07f8d8bf837a. Automatic detecion of the encoding from the ÙNB
segment is now supported through the Reader
class:
let reader = new Reader({ autoDetectEncoding: true });
let result = reader.parse(document);
The parse()
method returns an array of segments. Each segment is an object containing a segment name
and the list of elements
as an array of arrays with the actual component data.
@RovoMe I'm going to close this issue as of 356b69533d13355b6fbf2a3ebbeb07f8d8bf837a. Automatic detecion of the encoding from the ÙNB
segment is now supported through the Reader
class:
let reader = new Reader({ autoDetectEncoding: true });
let result = reader.parse(document);
The parse()
method returns an array of segments. Each segment is an object containing a segment name
and the list of elements
as an array of arrays with the actual component data.
This is because of regular expressions like
/[A-Z0-9.,\-()/= ]*/g
used in the Validator. Any advices how to solve this?