Hi,
I would like to process a series of french dates such as:
"lundi 20, mardi 21 mercredi 22 jeudi 23 vendredi 24 samedi 25 et dimanche 26
avril"
An equivalent in english would be:
"Monday 20, Tuesday 21 Wednesday 22 Thursday 23 Friday 24 Saturday 25 and
Sunday, April 26"
where all the dates should apply ("be relative") to April.
I ended up writing the following rule:
RULENAME="date_r4d2",EXTRACTION="(%reWeekday %reDayNumber%reAndOrTo)+%reWeekday
%reDayNumber
(%reMonthLong|%reMonthShort)",NORM_VALUE="UNDEF-year-%normMonth(group(7))-%normD
ay(group(3))",OFFSET="group(2)-group(3)"
Note: %reAndOrTo is ( et | ou | au |,\s|\s) in my case
And I get this result:
Monday, March 9, 2015 0:00
Tuesday, March 10, 2015 0:00
Wednesday, March 11, 2015 0:00
Thursday, March 12, 2015 0:00
Friday, March 13, 2015 0:00
Saturday, April 25, 2015 0:00
Sunday, April 26, 2015 0:00
The XML version:
<!DOCTYPE TimeML SYSTEM "TimeML.dtd">
<TimeML>
<TIMEX3 tid="t5" type="DATE" value="2015-03-09">lundi</TIMEX3> 20, <TIMEX3
tid="t6" type="DATE" value="2015-03-10">mardi</TIMEX3> 21 <TIMEX3 tid="t7"
type="DATE" value="2015-03-11">mercredi</TIMEX3> 22 <TIMEX3 tid="t8"
type="DATE" value="2015-03-12">jeudi</TIMEX3> 23 <TIMEX3 tid="t9" type="DATE"
value="2015-03-13">vendredi</TIMEX3> 24 <TIMEX3 tid="t4" type="DATE"
value="2015-04-25">samedi 25</TIMEX3> et <TIMEX3 tid="t3" type="DATE"
value="2015-04-26">dimanche 26 avril</TIMEX3>
</TimeML>
As you can see, only the two last dates are correct.
The key here is that I have a repeatable group (%reWeekday
%reDayNumber%reAndOrTo)+
I tried a lot of alternatives in my regular expression, like using a
non-capturing group as in \(hello\), etc.
Actually, I do not know if it is an OFFSET issue or if Heideltime is not able
to handle such regular expressions.
Do you have any clue that could help me ?
Thank you very much,
Pascal
Original issue reported on code.google.com by pascalgi...@gmail.com on 14 Mar 2015 at 9:39
Original issue reported on code.google.com by
pascalgi...@gmail.com
on 14 Mar 2015 at 9:39