mhulden / foma

Automatically exported from code.google.com/p/foma
117 stars 90 forks source link

Rule representation #58

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
I have a restriction case that I don't know how to represent in regex:

A ?* Z ?* N => A ?* Z ?* C ?* N

That means, every time the sequence AZN appears, the element C must be between 
Z and N, but this sequence can have some other elements in between without 
altering  the order of the sequence, for instance
AbhsZCN
AjsZkeCN
AoiZpsCxmN
AZreCN
etc.
These are the actual elements (this is a wrong representation, the "->" is not 
right)
[["@N" ?* "+VRB" ?* "+ADV"] -> ["@N" ?* "+VRB" ?* "+PVN" ?* "+ADV"]]

Thanks....

Original issue reported on code.google.com by andreschandiaf on 19 Mar 2015 at 7:31

GoogleCodeExporter commented 9 years ago
I have reformulated the question so maybe it is more understandable:

I have a language that can basically be in the next form:

ROOT+S36+S35+S34.......+S15+S14.........+S2+S1

but there are restrictions that make the root appear with only some suffixes 
and not with all the 36, for instance:

[ ~$[S36 ?* [S20|S12|S2]];  ! S36 can not combine with S20, S12 or S2
[S14 => ?* _ ?* S7 ?*]; ! Every time that S14 appears, S7 must also appear
[[S10 => ?* S12 ?* _ ?* S8] | [S10 => ?* _ ?* S7 ?*]]; ! Every time S10 
appears, it must appear either between S12 and S8, or with S7
etc.
and here comes the one I don't know how to express in regex:

! Every time the sequence [?* S36 ?* S20 ?* S5 ?*] appears, it must appear S8 
between S20 and S5, like this: [?* S36 ?* S20 ?* S8 ?* S5 ?*]

That is the question, and thanks again......

Original comment by andreschandiaf on 20 Mar 2015 at 9:34

GoogleCodeExporter commented 9 years ago
First, this part:

[S10 => ?* S12 ?* _ ?* S8] | [S10 => ?* _ ?* S7 ?*]

should probably read:

[S10 => S12 ?* _ ?* S8 ,  _  ?* S7]

(that's how two contexts are specified; also you don't need ?* at edges)

The actual question is: when you see ?* S36 ?* S20 ?* S5 ?*, S8 must appear 
between S20 and S5. This seems like it's doable without context restriction, 
like so:

~$[?* S36 ?* S20 ~$S8 S5 ?*]

Original comment by mans.hul...@gmail.com on 21 Mar 2015 at 3:03

GoogleCodeExporter commented 9 years ago
Ok, thanks a lot for the two things, great! I see it now!

Original comment by andreschandiaf on 21 Mar 2015 at 3:10