qwaider / heideltime

Automatically exported from code.google.com/p/heideltime
0 stars 0 forks source link

Regular expression #27

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Hi,

I would like to process a series of french dates such as:

"lundi 20, mardi 21 mercredi 22 jeudi 23 vendredi 24 samedi 25 et dimanche 26 
avril"

An equivalent in english would be:

"Monday 20, Tuesday 21 Wednesday 22 Thursday 23 Friday 24 Saturday 25 and 
Sunday, April 26"

where all the dates should apply ("be relative") to April.

I ended up writing the following rule:

RULENAME="date_r4d2",EXTRACTION="(%reWeekday %reDayNumber%reAndOrTo)+%reWeekday 
%reDayNumber 
(%reMonthLong|%reMonthShort)",NORM_VALUE="UNDEF-year-%normMonth(group(7))-%normD
ay(group(3))",OFFSET="group(2)-group(3)"

Note: %reAndOrTo is ( et | ou | au |,\s|\s) in my case

And I get this result:

Monday, March 9, 2015 0:00
Tuesday, March 10, 2015 0:00
Wednesday, March 11, 2015 0:00
Thursday, March 12, 2015 0:00
Friday, March 13, 2015 0:00
Saturday, April 25, 2015 0:00
Sunday, April 26, 2015 0:00

The XML version:
<!DOCTYPE TimeML SYSTEM "TimeML.dtd">
<TimeML>
<TIMEX3 tid="t5" type="DATE" value="2015-03-09">lundi</TIMEX3> 20, <TIMEX3 
tid="t6" type="DATE" value="2015-03-10">mardi</TIMEX3> 21 <TIMEX3 tid="t7" 
type="DATE" value="2015-03-11">mercredi</TIMEX3> 22 <TIMEX3 tid="t8" 
type="DATE" value="2015-03-12">jeudi</TIMEX3> 23 <TIMEX3 tid="t9" type="DATE" 
value="2015-03-13">vendredi</TIMEX3> 24 <TIMEX3 tid="t4" type="DATE" 
value="2015-04-25">samedi 25</TIMEX3> et <TIMEX3 tid="t3" type="DATE" 
value="2015-04-26">dimanche 26 avril</TIMEX3>
</TimeML>

As you can see, only the two last dates are correct.

The key here is that I have a repeatable group (%reWeekday 
%reDayNumber%reAndOrTo)+

I tried a lot of alternatives in my regular expression, like using a 
non-capturing group as in \(hello\), etc.

Actually, I do not know if it is an OFFSET issue or if Heideltime is not able 
to handle such regular expressions.
Do you have any clue that could help me ?

Thank you very much,

Pascal

Original issue reported on code.google.com by pascalgi...@gmail.com on 14 Mar 2015 at 9:39