Closed GoogleCodeExporter closed 9 years ago
Hi,
"19-Nov-12" is not a format, HeidelTime knows yet. However, it is quite easy to
add a rule, which identifies such expressions.
Go to: resources/english/rules/resources_rules_daterules.txt
And add the following rule:
RULENAME="date_r0g",EXTRACTION="%reDayNumber-%reMonthShort-%reYear2Digit",NORM_V
ALUE="UNDEF-centurygroup(3)-%normMonth(group(2))-%normDay(group(1))"
--> whether the century is normalized correctly depends on the context of the
expression in the document
Maybe, you also want to include a rule for expressions such as "19-Nov-2012":
RULENAME="date_r0h",EXTRACTION="%reDayNumber-%reMonthShort-%reYear4Digit",NORM_V
ALUE="group(3)-%normMonth(group(2))-%normDay(group(1))"
Then, go to resources/ and run "sh printResourceInformation.sh"
If you don't want to modify the rules, you can also wait until we include these
rules in the resources. I actually don't see a reason not to include them.
Thanks for your feedback. If you have any further questions, please let me know.
Best regards,
Jannik
Original comment by jannik.s...@gmail.com
on 4 Mar 2013 at 10:01
Original comment by jannik.s...@gmail.com
on 4 Mar 2013 at 10:06
Hi. Adding rules by hand is not scaleable for me. I am working with a corpus
that is a few gigabytes in size and there are a ton of formats in which dates
are expressed. Now, what could be helpful is that I get a substring where
heideltime thinks the date is. For example:
Last post on 26-Nov-12. Next post on 27-Nov-12. Currently the timex quotes are
placed around Nov and that is not too helpful. If I could get "26-Nov-12" as
the potential-temporal-expression, then I could do something about the rules
(mechanical turk for example).
Would this be possible?
Original comment by shripha...@gmail.com
on 4 Mar 2013 at 10:01
Hi,
As mentioned above, it is actually not a big deal to add a couple of rules.
They make use of regular expressions and are thus quite general.
My mentioned rules are written in a verbatim style and easily extendable.
If you need more specific help, e.g., if you have a couple of other patterns
that you identified to occur frequently, we can make this offline the thread.
Just send me an email with some more information.
Nevertheless, it would be possible to write rules given you an identified
temporal expression and its surrounding context tokens, but this would not be
in a way, HeidelTime is supposed to work.
I can give you more details if you when. Just send me an email.
Best regards,
Jannik
Original comment by jannik.s...@gmail.com
on 6 Mar 2013 at 8:05
We've added the rule in question. Expressions such as the one from your
original post will now be extracted and normalized correctly.
The commit containing this rule addition is r54fcd94430ad. The next release
(HeidelTime 1.5) will contain this rule.
Thanks a lot again for bringing this to our attention!
Original comment by z...@informatik.uni-heidelberg.de
on 3 Jul 2013 at 3:44
Original comment by z...@informatik.uni-heidelberg.de
on 18 Sep 2013 at 8:48
Original issue reported on code.google.com by
shripha...@gmail.com
on 4 Mar 2013 at 8:06