Closed sabas closed 7 years ago
hello, i currently tryed to send an UNOC with accents 'éà' and the Parser delete those characters ? am i doing something wrong or must i improve your Parser ?
You need to override the stripping regex like this.
$p = new Parser();
$p->setStripRegex("/[\x01-\x1F\x80-\xFF]/");
$p->loadString(SOMEEDIFACTSTRING);
(this regex removes all chars between these hex codes, if you find a regex that suits UNOC please share :-) )
Thanks for your answer, i discovered that i have to use setStripRegex() i was wondering if it was not a good idea to read the section UNB, get the UNOx information and adapt directly the regexp even with the current default for those we don't know.
do you mind if i modify your Parser on this way ?
Please go ahead! By default it should go with UNOA, if it's different it should override calling setStripRegex.
Thanks, i'm just concern that the Parser will have to read the content and we have a Reader made for that. I've seen that your Parser was already adaptive on UNA message. I will use a similar method for UNB.
About this UNA message, you've chosen to delete it from $parsedfile. So if you Parse a file and re-Encode it with the Encoder, the consequence it that you have modified the original EDI file. That was one of my first test on your solution and it fails.
The UNA part was added by @Azzurvif
Perhaps we could save the UNA array (if existing) in a private variable and add a public get method, so when encoding back you can reuse it?
For UNA, that could be a solution, on my project Parser and Encoder are far away from each other.
Maybe it's the job of the Encoder to set its default splitting characters and so to add UNA according to the setups. This way, it's a mirror solution with the Parser. For backward compatibility Encoder may not add UNA by default.
According to this spec https://sandroaspbiztalkblog.wordpress.com/2009/08/15/edifact-encoding-edi-character-set-support/ UNOA encoding is more restrictive than what you set by default, your regexp \x01-\x1F\x80-\xFF appear to be UNOB nor UNOA
I will set UNOB as default mode for backward compatibility but for those who are receiving an UNOA message within min chars, it would make a deprecation, those min chars are not UNOA complient. What you want to do ? You want to stay on standards or preserve use for users ?
one more info about UNOA : http://myedinotes.blogspot.fr/2012/05/unoa-character-set.html
Currently Encoder simply encodes with standard delimiters (they are hardcoded), It would need a refactoring as the parser, so if one wants to encode with non standard chars the code should be
$c = new Encoder();
$c->setUNA(XXXXX);
$c->encode($array, $wrap);
I agree to standardize to UNOB as default, so to not break compatibility.
I let you check, but i think i have implemented everything we talked about. Have a nice week.
https://www.stylusstudio.com/edifact/40003/0001.htm
For EDIFACT documents of syntax version UNOA, characters A-Z, 0-9, blank and . , ( ) / - = are allowed. For syntax version UNOB, characters a-z, A-Z, 0-9, blank and . , ( ) / - = : + ` ? are allowed. All other EDIFACT syntaxes are linked to the standard ISO character sets. [https://msdn.microsoft.com/en-us/library/aa559562(v=bts.20).aspx]
UNOB can use these separators: The Information Separator control characters are used as follows. IS 4 hex value '1C' segment terminator IS 3 hex value '1D' data element separator IS 1 hex value '1F' component data element separator
UNOC to UNOK use ISO-8859-*