mhulden / foma

Automatically exported from code.google.com/p/foma
117 stars 90 forks source link

Several problems with `foma2js.perl` / `foma2js.py` #155

Open dhdaines opened 5 months ago

dhdaines commented 5 months ago

The JavaScript generated by these scripts, while it works, is not really correct (this is 90% of all JavaScript code in the world, so don't feel bad). It seems that the intention here is to create separate Arrays for transitions, alphabet, and finals, or perhaps put them all in the same Array?

var myNet = new Object;
myNet.t = Array;
myNet.f = Array;
myNet.s = Array;

Regardless, this doesn't do either of those things, because you didn't add the magical new operator. It just sets properties on the global builtin Array object. This is likely to cause random problems for any other JavaScript code that is loaded with the FST. Also, it makes it impossible to serialize myNet to JSON. You shouldn't be using an Array for these in the first place because JSON.stringify can't enumerate string keys on an array, even if JavaScript, in its infinite wisdom, lets you use them. Instead, I suggest doing this (PR coming soon):

var myNet = new Object;
myNet.t = new Object;
myNet.f = new Object;
myNet.s = new Object;

(yes, you could make them all the same Object since the key names are unique, but I don't see a good reason to do this)

Also, foma2js.py misses some symbols in the alphabet - I'll just fix this in the PR to come.

Note that foma_apply_down.js actually only needs to know the input symbols in the alphabet, so if you want to save some space, you can omit the output symbols from the s array.

Note also that maxlen is wrong when there are surrogate pairs, I've fixed this in the pyfoma implementation :) (and in #156 too)

dhdaines commented 5 months ago

(note, actually, the optimal solution is not to output JavaScript at all, but just to output JSON that you assign to a JavaScript object, like the pyfoma implementation does)