neilgupta / Sherlock

Natural-language event parser for Javascript
https://sherlock.neil.gg
MIT License
532 stars 32 forks source link

Dates that are incomplete still return a valid date but not as expected. #19

Closed jayenne closed 4 years ago

jayenne commented 7 years ago
  1. If I parse "tell bob on Fri 24th" given that today is Saturday 11th March, next Friday is 17th and Friday week is the 24th... Sherlock assumes I meant next Friday and returns Friday 17th.
  2. Additionally. If I parse "tell bob on Monday 24th" given that it's 11th March (as above) then the next Monday 24th or April.
  3. If I mistakenly give a date that can't exist like Wednesday 12 March 2017 it will return the correct day for 11th March (Saturday 11th March) but my mistake may have been the date but the day.

To further the robustness of the script you consider includeing an array of 'Did you mean...' options for possible correct dates.

i.e... Given any two/three (of four) parts to a date there would be a number of possible options returned. "Wednesday 12th March" could return the options: "Sunday 12th March 2017"; "Wednesday 15th March 2017" and "Wednesday 12th April".

If my string is "tell bob on Wed 12th" the options would be "Sunday 12th"; "Wednesday 15th" If my string is "tell bob on Wed 12th next year" or "tell bob on Wed 12th 2018" the options would be "Wed 12th September"; "Wed 12th September December"

Users could then present the options as a popup selector to my users.

I really do think you on to an incredible and increasingly relevant idea with Sherlock/Watson and would love it to be made even more robust.

Thanks for the script Neil.

neilgupta commented 7 years ago

The reason you're seeing that bug is because Sherlock treats "24th" as an ambiguous date. Absent any other information, it'll use that number to mean the day of the month, but it's not explicit, so Sherlock prefers relying on other info. In this case, it sees "Fri" and uses that. For example, if you wrote "I placed 24th on fri", Sherlock would do the right thing. As I'm sure you discovered, if you drop "fri" from your string, Sherlock will return March 24th. It would be great to combine the 2 tokens and look ahead to see if Friday the 24th is a valid date, but that could result in a non-trivial performance hit to cover all cases. Sadly, English is an ambiguous language.

Similarly for case 3, Sherlock is using the strongest signal. Just like "fri" is a better signal than just "24th," "12 March 2017" is a better signal than "Wednesday" so it uses that.

Returning an array of possibilities when given bad user input is a great idea, but not one I have time to tackle any time soon. I think it's doable, but my concern would be the performance implications of finding all possibilities and returning a confidence score for each possibility.

jayenne commented 7 years ago

Thanks Neil, I appreciate your time to fix those ou have and explain the problems behind the others. Have great day…

keep working like it’s play-time, and playing like it’s business. Have a great weekend.

Jayenn

On 11 Mar 2017, at 15:53, Neil Gupta notifications@github.com<mailto:notifications@github.com> wrote:

The reason you're seeing that bug is because Sherlock treats "24th" as an ambiguous date. Absent any other information, it'll use that number to mean the day of the month, but it's not explicit, so Sherlock prefers relying on other info. In this case, it sees "Fri" and uses that. For example, if you wrote "I placed 24th on fri", Sherlock would do the right thing. As I'm sure you discovered, if you drop "fri" from your string, Sherlock will return March 24th. It would be great to combine the 2 tokens and look ahead to see if Friday the 24th is a valid date, but that could result in a non-trivial performance hit to cover all cases. Sadly, English is an ambiguous language.

Similarly for case 3, Sherlock is using the strongest signal. Just like "fri" is a better signal than just "24th," "12 March 2017" is a better signal than "Wednesday" so it uses that.

Returning an array of possibilities when given bad user input is a great idea, but not one I have time to tackle any time soon. I think it's doable, but my concern would be the performance implications of finding all possibilities and returning a confidence score for each possibility.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/neilgupta/Sherlock/issues/19#issuecomment-285876664, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AHUl89zTC9Z9AQnyzvSuRIrMH1figE7Qks5rksNrgaJpZM4MaLpb.

jayenne commented 7 years ago

HI again,

I appreciate the points you responded with. Could I possibly request a feature that checks the extracted ‘day’ against the validated date with getDay() to check if the day’s match? if it fails then return validated:false but with starts: as the best guess date as you currently do. I would also like to have a new json string… extracted: {“starts”:[“day, ”month”, “year”],"ends”:[“day”, ”month”, ”year”] } - this would return only the day, month, year that were managed to be extracted from the string and that were used to calculate the returned dates. This would give me the opportunity to extend the logic and further parse those date elements.

I was also thinking about recurring dates within possible ranges such as "every 12th of the month” and “the 4th Tuesday of every month”, "every Wednesday”, “every day this week” (every week/month this week/month/year) and "every day between now and june 15th” / “every week from now and June” etc. - just thought for further direction ;)

I hope some of what I ask for makes sense if (not in time or finances) or thanks for listening to my ramblings Neil.

Take care. Jayenne

On 11 Mar 2017, at 15:53, Neil Gupta notifications@github.com<mailto:notifications@github.com> wrote:

The reason you're seeing that bug is because Sherlock treats "24th" as an ambiguous date. Absent any other information, it'll use that number to mean the day of the month, but it's not explicit, so Sherlock prefers relying on other info. In this case, it sees "Fri" and uses that. For example, if you wrote "I placed 24th on fri", Sherlock would do the right thing. As I'm sure you discovered, if you drop "fri" from your string, Sherlock will return March 24th. It would be great to combine the 2 tokens and look ahead to see if Friday the 24th is a valid date, but that could result in a non-trivial performance hit to cover all cases. Sadly, English is an ambiguous language.

Similarly for case 3, Sherlock is using the strongest signal. Just like "fri" is a better signal than just "24th," "12 March 2017" is a better signal than "Wednesday" so it uses that.

Returning an array of possibilities when given bad user input is a great idea, but not one I have time to tackle any time soon. I think it's doable, but my concern would be the performance implications of finding all possibilities and returning a confidence score for each possibility.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/neilgupta/Sherlock/issues/19#issuecomment-285876664, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AHUl89zTC9Z9AQnyzvSuRIrMH1figE7Qks5rksNrgaJpZM4MaLpb.

neilgupta commented 7 years ago

You can implement that extra validation fairly trivially via Watson by preprocessing the string for any numeric matches (the following regex /((?:[1-2]\\d|3[0-1]|0?[1-9])(?:st|nd|rd|th)?)/ should work), and then in postprocessing, check if that number substring is in Sherlocked.title, and if not, it was used for date matching and you can change the validated property accordingly.

Recurring dates is something I'd love to add but it's a pretty major feature to do correctly and I unfortunately don't have time to tackle that right now.