mhulden / foma

Automatically exported from code.google.com/p/foma
115 stars 90 forks source link

optional replacement of input of partially identical sequences doesn't work #87

Open mcswell opened 5 years ago

mcswell commented 5 years ago

Optional rules where the left- and right-hand sides are partially identical sequences don't work, although the corresponding obligatory rule does. Example: foma[0]: def l {ange}; redefined l: 328 bytes. 5 states, 4 arcs, 1 path. foma[0]: regex l .o. [{ng} -> {ny}]; 484 bytes. 5 states, 4 arcs, 1 path. foma[1]: lower anye foma[1]: regex l .o. [{ng} (->) {ny}]; 484 bytes. 5 states, 4 arcs, 1 path. foma[2]: lower ange The first rule, with obligatory replacement, correctly returns 'anye'. But the second rule, with optional replacement, should give both 'ange' (unchanged) and 'anye' (changed), but only gives the unchanged form.

When I do regex [{ng} (->) {ny}];, I get a rather network with 2 states but only 4 arcs. The odd thing (and the reason it doesn't work, I suppose) is that there's an arc from state 1 back to 0, but no arc to get to state 1.

The problem seems to happen only when the input and output are sequences, and share a first character (or maybe a sequence of initial characters). For example, the following works correctly: {ng} (->) {xg} but the following (as above) does not: {ng} (->) {ny} BTW, the reason for wanting to write the rule in this semi-redundant way is that in Indonesian, the digraphs 'ng' and 'ny' represent single phonemes.

rcastromamani commented 3 years ago

I recently found myself in a similar situation when implementing the following optional replacement rule:

define THReplacement [ {ts} (->) {th} || _ [a|e|i|o|á|é|í|ó] ]; ! thamiri -> tsamiri

The only workaround I could use was the following:

define THReplacement [ {ts} -> {th}, {ts} -> {ts} || _ [a|e|i|o|á|é|í|ó] ];