mojombo / chronic

Chronic is a pure Ruby natural language date parser.
http://injekt.github.com/chronic
MIT License
3.23k stars 451 forks source link

Parse the string, leaving behind non-tokens for use elsewhere #148

Closed nogweii closed 11 years ago

nogweii commented 11 years ago

My idea is given a string such as "Go to the doctor's tomorrow" from the UI of an application, it would be awesome if there is a way to get the string "Go to the doctor's" as well as the Time instance. Is there a way to do this already?

leejarvis commented 11 years ago

No, there's no way to do this. Right now Chronic discards the tokens it doesn't care about right here. That is, any token that doesn't contain any tags will be thrown into /dev/null.

I don't see the benefits of extracting the non tagged items myself. At least not enough to alter the way Chronic works and store them in memory. There will be a ton of caveats when doing this. For example, the word at will be tokenized and tags will be applied to it. So if you had tomorrow I'll be at the station you'll have I'll be the station returned.

Here's how you could extract those values, though (see the other_words variable):

    def tokenize(text, options)
      text = pre_normalize(text)
      tokens = text.split(' ').map { |word| Token.new(word) }
      [Repeater, Grabber, Pointer, Scalar, Ordinal, Separator, TimeZone].each do |tok|
        tok.scan(tokens, options)
      end
      other_words = tokens.reject(&:tagged?).map(&:word) #=> ["Go", "to", "the", "doctor's"]
      tokens.select { |token| token.tagged? }
    end