mojombo / chronic

Chronic is a pure Ruby natural language date parser.
http://injekt.github.com/chronic
MIT License
3.24k stars 461 forks source link

Rewrite Handlers, Repeaters and improve Tagging #278

Closed davispuh closed 7 years ago

davispuh commented 10 years ago

Finally, I've something to show :)

So basically, I started this refactoring like a year ago or so. It was meant to be just few fixes, but that wasn't really possible as had to change a lot of things and now, it looks like it's quite a big rewrite. Still not yet fully finished and there's probably some bugs and things left to do but I wanted to show this what I've. All tests in test_parsing does pass, but some needed to be updated because previously Chronic handled incorrectly some cases. Other tests still need to be rewritten for new implementation.

So anyway, what does this rewrite gives to us? This implementation will correctly parse all previously supported date/time formats with timezone support, which will resolve dozens of issues. Also DST changes won't ruin dates on other days (loads of issues like #179). And will give good foundation to implement a lot more date/time formats very simply. But most importantly Chronic will finally parse dates sanely. I'll try to explain, what I mean by sane parsing.

Take a look at this image and example code chronic objects

now = Time.parse('2014-09-27 22:45:00')
options = {:now => now, :guess => false, :context => :none}

# First timeline from image, parse exact date or time, Span size will be based on precision.
Chronic.parse('22', options)
=> 2014-09-27 22:00:00...2014-09-27 23:00:00
# can see, how it's whole hour, because minutes wasn't specified

# some more examples
Chronic.parse('22:48', options)
=> 2014-09-27 22:48:00...2014-09-27 22:49:00
Chronic.parse('2014-09', options)
=> 2014-09-01 00:00:00...2014-10-01 00:00:00

# 2nd timeline (Arrow), after or before some interval in future or past from now
# notice how span's end is based on hour precision
Chronic.parse('2 hours ago', options)
=> 2014-09-27 20:45:00..2014-09-27 21:00:00

# here with minute
Chronic.parse('after 2 days, 3 hours and 1 minute', options)
=> 2014-09-30 01:46:00..2014-09-30 01:47:00

# sometimes order is important
Chronic.parse('after 5 days and 1 weekend', options)
=> 2014-10-04 00:00:00..2014-10-05 00:00:00
Chronic.parse('after 1 weekend and 5 days', options)
=> 2014-10-09 00:00:002014-10-10 00:00:00 +0300

# 3rd timeline (Anchor), some specific range
Chronic.parse('this hour', {:now => now, :guess => false, :context => :future})
=> 2014-09-27 22:45:00...2014-09-27 23:00:00
# see how Span start from right now and then only till next hour, because we're looking for something in future which can happen "this hour" at any moment.

# and if we're looking in past
Chronic.parse('this hour', {:now => now, :guess => false, :context => :past})
=> 2014-09-27 22:00:00...2014-09-27 22:45:00

# to get, whole range like in '22' case
Chronic.parse('this hour', {:now => now, :guess => false, :context => :none})
=> 2014-09-27 22:00:00...2014-09-27 23:00:00

# it works exactly same for all cases
Chronic.parse('this week', {:now => now, :guess => false, :context => :past})
=> 2014-09-21 00:00:00...2014-09-27 22:45:00

# still correct even if context doesn't make sense
Chronic.parse('yesterday', {:now => now, :guess => false, :context => :future})
=> 2014-09-26 00:00:00...2014-09-27 00:00:00

# 4th timeline (Narrow), narrow down from bigger range
Chronic.parse('2nd sunday next summer', options)
=> 2015-06-28 00:00:00...2015-06-29 00:00:00

# can mix these objects in various combinations
Chronic.parse('2 years ago last week', options)
=> 2012-09-16 00:00:00...2012-09-23 00:00:00

# more things...

Next really good part is, awesome tagging support (definition.rb), basically, it's a RegExp for date parsing, it should be really easy to add new date/time formats and override/monkey-patch if defaults doesn't suit. For example

[[MonthName, [SeparatorSpace, SeparatorDash, :optional], ScalarDay, [SeparatorSpace, Unit, :none]], :handle_mn_sd]

this definition will match only if all criteria will pass, that is, first token, must have MonthName tag (eg. jan, january, etc), then optionally either SeparatorSpace (`) orSeparatorDash(-), thenScalarDayand next it should not beSeparatorSpacefollowed byUnit(eg. 'year'), if it matches thenhandle_mn_sdwill be called. In simpler words:optionalmeans match ANY from array and:none` means don't match this array.

Okay, I think I've described main changes and I suggest to just look at code and try it yourself, some old Chronic issues should be already resolved here, but mostly this is just main work done to be much easier fix everything later and create the most robust parsing library :) I really like some parts of this implementation, but now since I had to add a lot of features can see some things are getting really complex and ugly. But still IMO this is way better than was before and most importantly can easily add all date/time formats you've ever wanted. This is just start and like I said it's not ready, so shouldn't be merged yet.

davispuh commented 10 years ago

@leejarvis have you taken a look on this? What do you think?

leejarvis commented 10 years ago

I think this looks good so far. There are a few methods in date_object and time_object and some other places that are a bit unwieldy so I'd be keen to clean those up but it looks like you've spent a lot of time on this. Great work!

davispuh commented 10 years ago

I'll first finish this so that all tests pass and only then clean it up. Also for some places I'm not sure about best implementation so it would be nice if you can give any tips what could be made better.