Open natesire opened 10 years ago
This already exists https://github.com/tmlee/time_difference
Thanks. I emailed the founder of time_difference. I am actually looking for something that uses machine learning in a natural language approach. I need to parse human written date ranges. I might fork your chronic and post the beginnings of it. I am still deciding on which language to implement the machine learning in. Python has a great NLP TLKT. And C++ for Ruby extensions might take a while. But I even like Scala. Any ideas?
From a ruby perspective do you have a aversion to wrapping chronic with time_duration?
Something like this:
require 'chronic'
require 'time_difference'
humanStatement1 = "this tuesday 1pm"
humanStatement2 = "this tuesday 3pm"
humanStatement1Parsed = Chronic.parse(humanStatement1)
humanStatement2Parsed = Chronic.parse(humanStatement2)
# very human readable version
puts TimeDifference.between(humanStatement1Parsed, humanStatement2Parsed).in_hours #=> 2.0
# No need for the Prased Variables version
puts TimeDifference.between(Chronic.parse(humanStatement1), Chronic.parse(humanStatement2)).in_hours #=> 2.0
# Single Line version
puts TimeDifference.between(Chronic.parse("this tuesday 1pm"), Chronic.parse("this tuesday 3pm")).in_hours #=> 2.0
Use your NLP to tokenize the statements into the start date token and the end date token (humanStatement1 and humanStatement2)
For NLP have you looked at OpenNLP? http://opennlp.apache.org
and then for the ruby bindings, use: https://github.com/louismullie/open-nlp
I am testing time_difference. I didn't even know about openNLP. Awesome. I am checking all of this out.
I have to handle all kinds of weird characters like - / -- & etc... that can be inside and outside parts of the dates. I am going to write the more advanced parsing in Scala.
This is why you have NLP to tokenize your text to remove useless characters or replace the unneeded characters or words.
I see. Tokenization should work. Currently, my algorithm reads the sentence from 0 till chronic returns nil. Then it reads the sentence backwards until the previous nil point. I'll check and see how well tokenization can just provide me two dates.
Here's an example I am running into with chronic. 'Jan first week' is nil 'Jan first' is valid in chronic 'Jan' isn't valid, chronic returns 2015-01-16 12:00:00 -0500
So your idea is to erase 'week' and leave 'first', using tokenization?
I wrote a test in Python.
Here is the output [('Available', 'JJ'), ('June', 'NNP'), ('9', 'CD'), ('--', ':'), ('August', 'NNP'), ('first', 'JJ'), ('week', 'NN')] ['June', '9', 'August'] ['June', '9', 'August']
import nltk import MySQLdb import time import string import re
sentence = 'Available June 9 -- August first week' tokens = nltk.word_tokenize(sentence)
parts_of_speech = nltk.pos_tag(tokens) print parts_of_speech
approved_prepositions = ['NNP', 'CD'] filtered = [] for word in parts_of_speech:
if any(x in word[1] for x in approved_prepositions):
filtered.append(word[0])
print filtered
normalized = re.sub(r'\s\W+', ' ', ' '.join(filtered)) print filtered
I can write a white-list function for words like 'first'. I am really liking this solution. Great idea to tokenize. Now I need a different excuse to write something in Scala. hahahahaha
Here's an example I am running into with chronic. 'Jan first week' is nil 'Jan first' is valid in chronic 'Jan' isn't valid, chronic returns 2015-01-16 12:00:00 -0500
So your idea is to erase 'week' and leave 'first', using tokenization?
for examples like this i would make assumptions about the formats for the dates. Example if someone does "Jan First Week" you use NLP to grab the Month, and they they want Week 1. Then use the ruby date library to grab the day 1 in week 1 and day 7 in week 1.
Take a look at this for an example of grabbing the date of a day number in a week number: http://www.ruby-doc.org/stdlib-2.1.1/libdoc/date/rdoc/Date.html#method-c-commercial
Then use the time_difference library to get the duration.
I wrote a white-list function. Python is handling things beautifully. I can feed the output into chronic. I can call a python script from ruby. Let me know if chronic needs contributions.
great
I am writing my own solution to calculate date ranges (e.g. May 22 2014 to June 3 2015) based on chronic. I would gladly contribute this solution if needed.