tdecaluwe / node-edifact

Javascript stream parser for UN/EDIFACT documents.
https://www.npmjs.com/package/edifact
Apache License 2.0
50 stars 13 forks source link

Error: Invalid character $ at position XXX #33

Closed stefanpartheym closed 4 years ago

stefanpartheym commented 4 years ago

Hey man,

thanks for your great npm module!

I currently use version 1.2.8 (latest, i guess) and experience the error "Invalid character $ at position XXX" after parsing the following EDIFACT massage (containing a dollar sign $) a second time:

UNA:+.? '
UNB+UNOC:4+SENDER-0123:14+RECIPIENT-0123:14+20200121:0825+16400'
UNH+1+ORDERS:D:96A:UN'
BGM++123456'
DTM+012:20200118'
NAD+BP+BUYER-ID'
NAD+SU+SUPPLIER-ID'
CUX+3:USD'
LIN+01++ITEM-0001$:SA'
QTY+21:1000:PCE'
UNS+S'
UNT+10+1'
UNZ+1+16400'

The first parsing process works fine, but the second one fails.

You can reproduce the error using the following code:

const { Parser } = require('edifact');
const edifactString =
`UNA:+.? '
UNB+UNOC:4+SENDER-0123:14+RECIPIENT-0123:14+20200121:0825+16400'
UNH+1+ORDERS:D:96A:UN'
BGM++123456'
DTM+012:20200118'
NAD+BP+BUYER-ID'
NAD+SU+SUPPLIER-ID'
CUX+3:USD'
LIN+01++ITEM-0001$:SA'
QTY+21:1000:PCE'
UNS+S'
UNT+10+1'
UNZ+1+16400'`;

// First parsing process works fine.
let parser = new Parser();
parser.encoding('UNOC');
parser.write(edifactString);
console.log('FIRST: SUCCESS');
// Second one fails due to "Invalid character $ at position 194"
parser = new Parser();
parser.encoding('UNOC');
parser.write(edifactString);
console.log('SECOND: SUCCESS');

This seems to happen due to a cache assigned to the Tokenizer class statically in line 146 of file tokenizer.js. To work around the issue I currently reassign a new Cache instance before each parsing process. Escaping the dollar sign with a ? also works, but I thought one would only need to escape characters like :, +, ? and '.

See the example below containing the workaround:

const { Parser } = require('edifact');
const Tokenizer = require('edifact/tokenizer');
const Cache = require('edifact/cache');
const edifactString =
`UNA:+.? '
UNB+UNOC:4+SENDER-0123:14+RECIPIENT-0123:14+20200121:0825+16400'
UNH+1+ORDERS:D:96A:UN'
BGM++123456'
DTM+012:20200118'
NAD+BP+BUYER-ID'
NAD+SU+SUPPLIER-ID'
CUX+3:USD'
LIN+01++ITEM-0001$:SA'
QTY+21:1000:PCE'
UNS+S'
UNT+10+1'
UNZ+1+16400'`;

// First parsing process works fine.
let parser = new Parser();
parser.encoding('UNOC');
parser.write(edifactString);
console.log('FIRST: SUCCESS');
// Reassign static cache
Tokenizer.cache = new Cache(40);
// Second parsing process succeeds
parser = new Parser();
parser.encoding('UNOC');
parser.write(edifactString);
console.log('SECOND: SUCCESS');

Best regards Stefan