jimregan / foma

Automatically exported from code.google.com/p/foma
0 stars 0 forks source link

fsm_read_prolog() cannot handle quotation marks in symbols #49

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. read regex %"test%"%)%.;
2. write prolog test.prolog
3. read prolog test.prolog

What is the expected output? What do you see instead?
The expected output is a simple FSA: (0) --"test").--> ((1)).

What I see instead are:
1. with print net:
   Ss0:    \"test\ -> fs1.
2. with view net:
   three states: ((];)), ((1)), (0), with no transitions between them.

What version of the product are you using? On what operating system?
Latest svn (or 0.9.17alpha).

Please provide any additional information below.
The behavior is caused by fsm_read_prolog() not handling quotation marks (") in 
symbols: when parsing the arc() clause in the .prolog file, it assumes that it 
ends with "symbol"). or ("in":"out"). However, quotation marks can be part of a 
symbol in foma, in which case they are escaped by a backslash (\). Two problems 
arise:

1. If the symbol contains ": or ")., as in the test above, those are 
erroneously parsed as the end of the input symbol and the clause, respectively.
2. Even if it doesn't, the parser treats the symbol string as it is, and 
doesn't remove the backslashes from before the quotation marks.

I have attached a patch that fixes this bug in io.c.

Original issue reported on code.google.com by nemesk...@gmail.com on 17 Jul 2013 at 12:08

Attachments:

GoogleCodeExporter commented 8 years ago
This is especially a problem for fomacg, where the first tag in a reading (the 
lemma) includes the quotation marks, and tags are represented as symbols.

Original comment by nemesk...@gmail.com on 17 Jul 2013 at 12:18