Open bluebear94 opened 1 month ago
Here’s a version of rewritten
that implements the DIR constraints (spelled as simultaneous
, forward
, backward
, or outer
):
def rewritten(fst: 'FST', *contexts, **flags) -> 'FST':
"""Returns a modified FST, rewriting self in contexts in parallel, controlled by flags."""
order = flags.get('dir', 'simultaneous')
defs = {'crossproducts': fst}
defs['br'] = FST.re("'@<@'|'@>@'")
defs['aux'] = FST.re(". - ($br|#)", defs)
defs['dotted'] = FST.re(".*-(.* '@<@' '@>@' '@<@' '@>@' .*)")
defs['base'] = FST.re("$dotted @ # ($aux | '@<@' $crossproducts '@>@')* #", defs)
if len(contexts) > 0:
center = FST.re("'@<@' (.-'@>@')* '@>@'")
if order == 'simultaneous':
lrpairs = ([l.ignore(defs['br']), r.ignore(defs['br'])] for l,r in contexts)
defs['rule'] = center.context_restrict(*lrpairs, rewrite=True).compose(defs['base'])
elif order == 'outer':
lrpairs = ([l.ignore(defs['br']), r.ignore(defs['br'])] for l,r in contexts)
defs['rule'] = defs['base'].compose(center.context_restrict(*lrpairs, rewrite=True))
else:
contexts = tuple(contexts)
lpairs = [[l.ignore(defs['br']), FST.re(".*")] for l, _ in contexts]
rpairs = [[FST.re(".*"), r.ignore(defs['br'])] for _, r in contexts]
left = center.__copy__().context_restrict(*lpairs, rewrite=True)
right = center.context_restrict(*rpairs, rewrite=True)
if order == 'forward':
defs['rule'] = right.compose(defs['base']).compose(left)
elif order == 'backward':
defs['rule'] = left.compose(defs['base']).compose(right)
else:
raise TypeError(f"dir must be simultaneous, forward, or backward (got {order})")
else:
defs['rule'] = defs['base']
defs['remrewr'] = FST.re("'@<@':'' (.-'@>@')* '@>@':''") # worsener
worseners = [FST.re(".* $remrewr (.|$remrewr)*", defs)]
if flags.get('longest', False) == 'True':
worseners.append(FST.re(".* '@<@' $aux+ '':('@>@' '@<@'?) $aux ($br:''|'':$br|$aux)* .*", defs))
if flags.get('leftmost', False) == 'True':
worseners.append(FST.re(\
".* '@<@':'' $aux+ ('':'@<@' $aux* '':'@>@' $aux+ '@>@':'' .* | '':'@<@' $aux* '@>@':'' $aux* '':'@>@' .*)", defs))
if flags.get('shortest', False) == 'True':
worseners.append(FST.re(".* '@<@' $aux* '@>@':'' $aux+ '':'@>@' .*", defs))
defs['worsen'] = functools.reduce(lambda x, y: x.union(y), worseners).determinize_unweighted().minimize()
defs['rewr'] = FST.re("$^output($^input($rule) @ $worsen)", defs)
final = FST.re("(.* - $rewr) @ $rule", defs)
newfst = final.map_labels({s:'' for s in ['@<@','@>@','#']}).epsilon_remove().determinize_as_dfa().minimize()
return newfst
This still doesn’t handle parallel rules, though, so I’m going to keep thinking about the problem.
By the way, do you have a link to a paper explaining the particular approach to rewrite rules used in PyFoma? I’d like to read more about it. I think it might be “A new method for compiling parallel replace rules” by Yli-Jyrä, but the “full technical report” mentioned there is no longer online.
Edit: I think I now understand what $dotted
is for (handling rewrite rules with empty inputs so $^rewrite('':a)
turns xxx
into axaxaxa
, not something like aaaxaxaxa
), but I’m still confused at how the worseners work.
Edit 2: I’ve found a bug with my implementation: $^rewrite2(a:b / _ a, dir=backward)
with an input of aaa
generates both aba
and baa
, while a -> b \\ _ a
in FOMA generates only aba
.
Is there an equivalent to specifying Foma’s DIR constraints (i.e.
||
,\\
,//
, or\/
) in PyFoma? I assume that it’s not implemented in PyFoma, but I want to make sure.I am trying to use PyFoma to implement expanded deduplication rules for my constructed language Ŋarâþ Crîþ – currently, this only affects a limited number of consonants, but this is set to change to encompass a wider range. If you’re curious, you can check my work-in-progress implementation.
(Incidentally, Foma’s support for parallel rules would also be useful to have for this.)