jazzband / Watson

:watch: A wonderful CLI to track your time!
http://tailordev.github.io/Watson/
MIT License
2.46k stars 240 forks source link

Improve input date parsing #10

Open jmaupetit opened 9 years ago

jmaupetit commented 9 years ago

When watson fails at parsing an input date, we must raise a clear message.

k4nar commented 9 years ago

Actually I don't think that our current format (ỲYYY-MM-DD, which is arrow's default one) is very relevant. Maybe we should think about something more flexible or natural?

willdurand commented 9 years ago

Maybe we should think about something more flexible or natural?

Any examples maybe?

k4nar commented 9 years ago

For example git is pretty versatile, it accepts things like "yesterday", "1 week ago" or "15/04/2015" for git log --since. I can't find the reference anywhere though.

k4nar commented 9 years ago

parsedatetime might be a good option (thx @jmaupetit).

k4nar commented 9 years ago

While parsedatetime is great, it's not a perfect fit for Watson. For example it seems to always be expecting MM/DD dates instead of DD/MM. Also it guesses dates in the future (monday is equivalent to next monday, not last monday) but this is not what's expected in the context of Watson.

Right now our system is more restrictive and cumbersome, but at least it's not bug prone.

FTR we must set YearParseStyle to 0 in order to parse Jan 1th as the current year and not the next.

SpotlightKid commented 8 years ago

There's a new (natural language) date parsing library here:

https://github.com/scrapinghub/dateparser

(Haven't tested it yet.)

jmaupetit commented 8 years ago

:+1: let's give it a try!

SpotlightKid commented 8 years ago

I experimented with dateparsera bit. I'm not so sure, I like it. It promises more that it can keep. All below are valid German date specifications. In German, for numbers up to 12, we generally use the word form, not the numeral in a sentence. Also "12 Uhr" normally means mid-day, not midnight, for that we use "Null Uhr" or "0 Uhr" to make it clear.

Not sure if it's worth to add such a heavy dependency when in effect it doesn't add much over dateutil .

In [1]: import dateparser

In [2]: dateparser.parse('Vor zwei stunden')

In [3]: dateparser.parse('Vor 2 stunden')
Out[3]: datetime.datetime(2015, 11, 14, 0, 4, 24, 713893)

In [4]: dateparser.parse('In 1 Tag')

In [5]: dateparser.parse('Morgen')

In [6]: dateparser.parse('morgen')

In [7]: dateparser.parse('morgen früh')

In [8]: dateparser.parse('morgen mittag')

In [9]: dateparser.parse('morgen mittag', languages=['de', 'en'])

In [10]: dateparser.parse('morgen 12 uhr', languages=['de', 'en'])

In [11]: dateparser.parse('morgen, 12 uhr', languages=['de', 'en'])

In [12]: dateparser.parse('gestern, 12 Uhr', languages=['de', 'en'])
Out[12]: datetime.datetime(2015, 11, 13, 0, 0)

In [13]: dateparser.parse('gestern, zwölf Uhr', languages=['de', 'en'])

In [14]: dateparser.parse('vorgestern, 12 Uhr', languages=['de', 'en'])
Out[14]: datetime.datetime(2015, 11, 12, 0, 0)

In [15]: dateparser.parse('vorgestern, null Uhr', languages=['de', 'en'])

In [16]: dateparser.parse('vorgestern, 0 Uhr', languages=['de', 'en'])
Out[16]: datetime.datetime(2015, 11, 12, 2, 4, 24, 713893)
jmaupetit commented 8 years ago

Thank you for your feedback. Shouldn't we focus on English first (and only English) for a CLI?

yloiseau commented 8 years ago

I don't know if you have considered dateutil but I find it useful for fuzzy human date parsing. Example from the doc:

>>> from dateutil.parser import parse
>>> parse("Today is January 1, 2047 at 8:21:00AM", fuzzy_with_tokens=True)
(datetime.datetime(2011, 1, 1, 8, 21), (u'Today is ', u' ', u'at '))
firecat53 commented 8 years ago

I'd second a switch to dateutil. It's used successfully in the gcalcli project.

Thanks! Scott

SpotlightKid commented 8 years ago

In my experience, dateutil.parser.parse(s, fuzzy=True) often guesses wrong. If we're going to use it, at the very least we should make options like dayfirst and yearfirst configurable. And it also has the problem of defaulting to dates in the future, e.g. parse("Monday", fuzzy=True) == datetime.datetime(2016, 3, 7, 0, 0).

jmaupetit commented 8 years ago

To decide whether we should use dateutil or dateparser, I will crunch a test dataset and post the results here later.

jmaupetit commented 8 years ago

So, I wrote a quick and dirty script to compare dateutils vs dateparser parse methods:

#!/usr/bin/env python3
"""Compare (fuzzy) dateutils vs dateparser `parse` methods"""

import sys

from dateparser import parse as dp_parse
from datetime import datetime, timedelta
from dateutil.parser import parse as du_parse

NOW = datetime.now()
DP_SETTINGS = {
    'RELATIVE_BASE': NOW,
}
EXPECTED_DATETIME = datetime(year=2016, month=9, day=1)
DATASET = (
    # (query, expected)
    ('2016/09/01', EXPECTED_DATETIME),
    ('2016-09-01', EXPECTED_DATETIME),
    ('09/01/2016', EXPECTED_DATETIME),
    ('09-01-2016', EXPECTED_DATETIME),
    ('09012016', EXPECTED_DATETIME),
    ('09/01/2016 15:20', EXPECTED_DATETIME.replace(hour=15, minute=20)),
    ('09/01/2016 at 15h20', EXPECTED_DATETIME.replace(hour=15, minute=20)),
    ('15 min ago', NOW - timedelta(minutes=15)),
    ('two hours ago', NOW - timedelta(hours=2)),
    ('a day ago', NOW - timedelta(days=1)),
    ('tuesday', (
        NOW.replace(hour=0, minute=0, second=0, microsecond=0) - \
        timedelta(days=(NOW.weekday() - 1)))),
    ('monday at noon', (
        NOW.replace(hour=12, minute=0, second=0, microsecond=0) - \
        timedelta(days=NOW.weekday()))),
)

def is_equal(time1, time2):
    return time1 == time2

def parse(parser, query, expected, **options):
    try:
        result = parser(query, **options)
    except:
        return 0
    if result and is_equal(result, expected):
        return 1
    return 0

def bench(dataset):
    du_scores = []
    dp_scores = []
    template = '| {:25} | {:>10} | {:>10} |'
    separator = template.format('-' * 25, '-' * 10, '-' * 10)

    print(template.format('query', 'dateutil', 'dateparser'))
    print(separator)

    for query, expected in dataset:
        du_score = parse(du_parse, query, expected, fuzzy=True)
        dp_score = parse(dp_parse, query, expected, settings=DP_SETTINGS)
        du_scores.append(du_score)
        dp_scores.append(dp_score)

        print(template.format(query, du_score, dp_score))

    print(separator)
    print(template.format(
        'total ({})'.format(len(du_scores)),
        sum(du_scores),
        sum(dp_scores))
    )

def main():
    bench(DATASET)
    return 0

if __name__ == '__main__':
    sys.exit(main() or 0)

And here are the results:

| query                     |   dateutil | dateparser |
| ------------------------- | ---------- | ---------- |
| 2016/09/01                |          1 |          1 |
| 2016-09-01                |          1 |          1 |
| 09/01/2016                |          1 |          1 |
| 09-01-2016                |          1 |          1 |
| 09012016                  |          0 |          1 |
| 09/01/2016 15:20          |          1 |          1 |
| 09/01/2016 at 15h20       |          1 |          1 |
| 15 min ago                |          0 |          1 |
| two hours ago             |          0 |          1 |
| a day ago                 |          0 |          1 |
| tuesday                   |          0 |          1 |
| monday at noon            |          0 |          1 |
| ------------------------- | ---------- | ---------- |
| total (12)                |          6 |         12 |

If my test data set is relevant with what we expect from Watson's date parser, my conclusion is that we must use dateparser. WDYT?

whilei commented 7 years ago

My attitude would be to first hit the big easy ones with a quick lookup, then find a more thorough NLP/18ln approach.

ie today -> datetime.datetime.today().strftime('%Y-%m-%d')

The big easy:

jessebett commented 5 years ago

Just found Watson and trying out instead of timewarrior. This is a big pain point for me right now. Adding or editing past items is quite annoying with required YYYY-MM-DD HH:mm format.

@davidag Is what is discussed in this thread, e.g. shortcuts for today and yesterday being addressed in #328?

davidag commented 5 years ago

@jessebett Humanized dates are not supported in #328, but adding by time is (e.g. watson add -f 10:00 -t 11:00).

I'd planned to improve date inputting, but Watson's development is a bit stagnated lately, so I'm looking for alternatives.

teutat3s commented 2 years ago

Thank you for hacking on watson, I'm a daily user and find it really useful!

The ability to adjust the date formatting for the add command would make usability even better for me (for me German dates like 04.02.2022 feel most natural)

Is there anything I can do to help push this forward? I could test and know how to read code, but haven't written much python code myself yet.