Open jmaupetit opened 9 years ago
Actually I don't think that our current format (ỲYYY-MM-DD
, which is arrow's default one) is very relevant. Maybe we should think about something more flexible or natural?
Maybe we should think about something more flexible or natural?
Any examples maybe?
For example git is pretty versatile, it accepts things like "yesterday", "1 week ago" or "15/04/2015" for git log --since
. I can't find the reference anywhere though.
parsedatetime might be a good option (thx @jmaupetit).
While parsedatetime is great, it's not a perfect fit for Watson. For example it seems to always be expecting MM/DD dates instead of DD/MM. Also it guesses dates in the future (monday
is equivalent to next monday
, not last monday
) but this is not what's expected in the context of Watson.
Right now our system is more restrictive and cumbersome, but at least it's not bug prone.
FTR we must set YearParseStyle
to 0 in order to parse Jan 1th
as the current year and not the next.
There's a new (natural language) date parsing library here:
https://github.com/scrapinghub/dateparser
(Haven't tested it yet.)
:+1: let's give it a try!
I experimented with dateparser
a bit. I'm not so sure, I like it. It promises more that it can keep. All below are valid German date specifications. In German, for numbers up to 12, we generally use the word form, not the numeral in a sentence. Also "12 Uhr" normally means mid-day, not midnight, for that we use "Null Uhr" or "0 Uhr" to make it clear.
Not sure if it's worth to add such a heavy dependency when in effect it doesn't add much over dateutil
.
In [1]: import dateparser
In [2]: dateparser.parse('Vor zwei stunden')
In [3]: dateparser.parse('Vor 2 stunden')
Out[3]: datetime.datetime(2015, 11, 14, 0, 4, 24, 713893)
In [4]: dateparser.parse('In 1 Tag')
In [5]: dateparser.parse('Morgen')
In [6]: dateparser.parse('morgen')
In [7]: dateparser.parse('morgen früh')
In [8]: dateparser.parse('morgen mittag')
In [9]: dateparser.parse('morgen mittag', languages=['de', 'en'])
In [10]: dateparser.parse('morgen 12 uhr', languages=['de', 'en'])
In [11]: dateparser.parse('morgen, 12 uhr', languages=['de', 'en'])
In [12]: dateparser.parse('gestern, 12 Uhr', languages=['de', 'en'])
Out[12]: datetime.datetime(2015, 11, 13, 0, 0)
In [13]: dateparser.parse('gestern, zwölf Uhr', languages=['de', 'en'])
In [14]: dateparser.parse('vorgestern, 12 Uhr', languages=['de', 'en'])
Out[14]: datetime.datetime(2015, 11, 12, 0, 0)
In [15]: dateparser.parse('vorgestern, null Uhr', languages=['de', 'en'])
In [16]: dateparser.parse('vorgestern, 0 Uhr', languages=['de', 'en'])
Out[16]: datetime.datetime(2015, 11, 12, 2, 4, 24, 713893)
Thank you for your feedback. Shouldn't we focus on English first (and only English) for a CLI?
I don't know if you have considered dateutil but I find it useful for fuzzy human date parsing. Example from the doc:
>>> from dateutil.parser import parse
>>> parse("Today is January 1, 2047 at 8:21:00AM", fuzzy_with_tokens=True)
(datetime.datetime(2011, 1, 1, 8, 21), (u'Today is ', u' ', u'at '))
I'd second a switch to dateutil. It's used successfully in the gcalcli project.
Thanks! Scott
In my experience, dateutil.parser.parse(s, fuzzy=True)
often guesses wrong. If we're going to use it, at the very least we should make options like dayfirst
and yearfirst
configurable. And it also has the problem of defaulting to dates in the future, e.g. parse("Monday", fuzzy=True) == datetime.datetime(2016, 3, 7, 0, 0)
.
To decide whether we should use dateutil
or dateparser
, I will crunch a test dataset and post the results here later.
So, I wrote a quick and dirty script to compare dateutils
vs dateparser
parse
methods:
#!/usr/bin/env python3
"""Compare (fuzzy) dateutils vs dateparser `parse` methods"""
import sys
from dateparser import parse as dp_parse
from datetime import datetime, timedelta
from dateutil.parser import parse as du_parse
NOW = datetime.now()
DP_SETTINGS = {
'RELATIVE_BASE': NOW,
}
EXPECTED_DATETIME = datetime(year=2016, month=9, day=1)
DATASET = (
# (query, expected)
('2016/09/01', EXPECTED_DATETIME),
('2016-09-01', EXPECTED_DATETIME),
('09/01/2016', EXPECTED_DATETIME),
('09-01-2016', EXPECTED_DATETIME),
('09012016', EXPECTED_DATETIME),
('09/01/2016 15:20', EXPECTED_DATETIME.replace(hour=15, minute=20)),
('09/01/2016 at 15h20', EXPECTED_DATETIME.replace(hour=15, minute=20)),
('15 min ago', NOW - timedelta(minutes=15)),
('two hours ago', NOW - timedelta(hours=2)),
('a day ago', NOW - timedelta(days=1)),
('tuesday', (
NOW.replace(hour=0, minute=0, second=0, microsecond=0) - \
timedelta(days=(NOW.weekday() - 1)))),
('monday at noon', (
NOW.replace(hour=12, minute=0, second=0, microsecond=0) - \
timedelta(days=NOW.weekday()))),
)
def is_equal(time1, time2):
return time1 == time2
def parse(parser, query, expected, **options):
try:
result = parser(query, **options)
except:
return 0
if result and is_equal(result, expected):
return 1
return 0
def bench(dataset):
du_scores = []
dp_scores = []
template = '| {:25} | {:>10} | {:>10} |'
separator = template.format('-' * 25, '-' * 10, '-' * 10)
print(template.format('query', 'dateutil', 'dateparser'))
print(separator)
for query, expected in dataset:
du_score = parse(du_parse, query, expected, fuzzy=True)
dp_score = parse(dp_parse, query, expected, settings=DP_SETTINGS)
du_scores.append(du_score)
dp_scores.append(dp_score)
print(template.format(query, du_score, dp_score))
print(separator)
print(template.format(
'total ({})'.format(len(du_scores)),
sum(du_scores),
sum(dp_scores))
)
def main():
bench(DATASET)
return 0
if __name__ == '__main__':
sys.exit(main() or 0)
And here are the results:
| query | dateutil | dateparser |
| ------------------------- | ---------- | ---------- |
| 2016/09/01 | 1 | 1 |
| 2016-09-01 | 1 | 1 |
| 09/01/2016 | 1 | 1 |
| 09-01-2016 | 1 | 1 |
| 09012016 | 0 | 1 |
| 09/01/2016 15:20 | 1 | 1 |
| 09/01/2016 at 15h20 | 1 | 1 |
| 15 min ago | 0 | 1 |
| two hours ago | 0 | 1 |
| a day ago | 0 | 1 |
| tuesday | 0 | 1 |
| monday at noon | 0 | 1 |
| ------------------------- | ---------- | ---------- |
| total (12) | 6 | 12 |
If my test data set is relevant with what we expect from Watson's date parser, my conclusion is that we must use dateparser
. WDYT?
My attitude would be to first hit the big easy ones with a quick lookup, then find a more thorough NLP/18ln approach.
ie today
-> datetime.datetime.today().strftime('%Y-%m-%d')
The big easy:
today
yesterday
week
Just found Watson and trying out instead of timewarrior. This is a big pain point for me right now. Adding or editing past items is quite annoying with required YYYY-MM-DD HH:mm
format.
@davidag Is what is discussed in this thread, e.g. shortcuts for today
and yesterday
being addressed in #328?
@jessebett Humanized dates are not supported in #328, but adding by time is (e.g. watson add -f 10:00 -t 11:00
).
I'd planned to improve date inputting, but Watson's development is a bit stagnated lately, so I'm looking for alternatives.
Thank you for hacking on watson, I'm a daily user and find it really useful!
The ability to adjust the date formatting for the add
command would make usability even better for me (for me German dates like 04.02.2022 feel most natural)
Is there anything I can do to help push this forward? I could test and know how to read code, but haven't written much python code myself yet.
When watson fails at parsing an input date, we must raise a clear message.