Multilingual library for normalisation and rendering of temporal expressions.
The normalisation step extracts temporal expressions from a text, using a language specific grammar, and exports them into a short, storable format.
>>> import datection
>>> from datetime import datetime
# simple datetime
>>> datection.export(u"Le 4 mars 2015 à 18h30", "fr")
[{'duration': 0,
'rrule': 'DTSTART:20150304\nRRULE:FREQ=DAILY;COUNT=1;BYMINUTE=30;BYHOUR=18',
'span': (0, 23)}]
# date interval with a recurrent exclusion
>>> datection.export(u"Du 5 au 29 mars 2015, sauf le lundi", "fr")
[{'duration': 1439,
'excluded': ['DTSTART:20150305\nRRULE:FREQ=DAILY;BYDAY=MO;BYHOUR=0;BYMINUTE=0;UNTIL=20150329T000000'],
'rrule': 'DTSTART:20150305\nRRULE:FREQ=DAILY;BYHOUR=0;BYMINUTE=0;INTERVAL=1;UNTIL=20150329',
'span': (0, 36)}]
# yearless date, with argument date reference
>>> datection.export(u"Le 4 mars à 18h30", "fr", reference=datetime(2015, 1, 1))
[{'duration': 0,
'rrule': 'DTSTART:20150304\nRRULE:FREQ=DAILY;COUNT=1;BYMINUTE=30;BYHOUR=18',
'span': (0, 18)}]
# past datetime
>>> datection.export(u"Le 4 mars 1990 à 18h30", "fr")
[]
# past datetime and authorized past exports
>>> datection.export(u"Le 4 mars 1990 à 18h30", "fr", only_future=False)
[{'duration': 0,
'rrule': 'DTSTART:19900304\nRRULE:FREQ=DAILY;COUNT=1;BYMINUTE=30;BYHOUR=18',
'span': (0, 18)}]
# continuous datetime interval
>>> datection.export(u"Du 5 avril à 22h au 6 avril 2015 à 8h", "fr")
[{'continuous': True,
'duration': 600,
'rrule': 'DTSTART:20150405\nRRULE:FREQ=DAILY;BYHOUR=22;BYMINUTE=0;INTERVAL=1;UNTIL=20150406T235959',
'span': (0, 38)}]
The export format contains 6 different items:
rrule
: a parseable expression, generating all the datetimes described by the expression. See the python-dateutil documentation and RFC 2445 for more details
duration
: the duration (in minutes) between each start datetime, egenrated by the rrule, and its end counterpart:
span
: the character interval defining where the temporal expression was found in the text
continuous
: boolean flag, indicating if the time interval is continuous or not.
excluded
: a list of rrules exclusion rrules.
unlimited
: if True, the rrules are considered as infinite.
The rendering step renders the export format in human readable formats, in a specific language.
Several formats can be chosen from:
>>> import datection
>>> schedule = datection.export(u"Le 5 mars 2015, 15h30 - 16h", "fr")
# default
>>> datection.display(schedule, 'fr')
u'Le 5 mars 2015 de 15 h 30 à 16 h'
# short
>>> datection.display(schedule, 'fr', short=True)
u'Le 5 mars de 15 h 30 à 16 h'
>>> datection.display(schedule, 'fr', short=True, reference=date(2015, 3, 3))
u'Ce jeudi de 15 h 30 à 16 h'
>>> datection.display(schedule, 'fr', short=True, reference=date(2015, 3, 4))
u'Demain de 15 h 30 à 16 h'
>>> datection.display(schedule, 'fr', short=True, reference=date(2015, 3, 5))
u"Aujourd'hui de 15 h 30 ç 16 h"
# SEO
>>> datection.display(schedule, 'fr', seo=True)
u'mars 2015'
# opening hours / place
>>> schedule = datection.export(u"Du lundi au vendredi de 8h à 12h30 et de 14h à 19h30", "fr")
>>> datection.display(schedule, 'fr', place=True)
u"""Lundi de 8 h à 12 h 30 et de 14 h à 19 h 30
Mardi de 8 h à 12 h 30 et de 14 h à 19 h 30
Mercredi de 8 h à 12 h 30 et de 14 h à 19 h 30
Jeudi de 8 h à 12 h 30 et de 14 h à 19 h 30
Vendredi de 8 h à 12 h 30 et de 14 h à 19 h 30
"""