serokell / tzbot

Timezone bot for Slack
Mozilla Public License 2.0
7 stars 2 forks source link

Parsing multiple time mentions that share the same parts #16

Closed Martoon-00 closed 1 year ago

Martoon-00 commented 1 year ago

Currently, we seem to handle only one mention of time per message

Implicit time parts problem

There seems to be an important nuance in the case when the message contains multiple time entries, imagine this text:

Let's go on Wednesday at 10:00 or 11:00.

Here Wednesday applies both to 10:00 and 11:00, and we should parse it accordingly.


Another example:

How about Wednesday at 10:00 / 11:00 OR 14:00 / 15:00 at Thursday

This should parse as

How about (Wednesday at 10:00 / 11:00) OR (14:00 / 15:00 at Thursday)

On the one hand, formalizing such rules sounds like a pretty tricky task. On the other, sometimes such messages can be ambiguous for humans too, and ambiguity leads to miscommunication and wasted time, so maybe forcing some rules would not be that bad.


Btw, a common example of ambiguity: 11:00 used by people who live with 24h format would be ambigous for someone who got used to am/pm format.

I think, in such case we really should suggest the user to edit their message. So having failing parsers is not a bad thing.


Finally, another bunch of examples:

1. How about 10:00 or 11:00 am today?    // Does "am" apply to 10:00?
2. How about 13:00 or 11:00 am today?    // What here? 
3. How about 13:00 or 11:00 today?       // Can we infer that 11:00 is in 24h format?

Acceptance criteria

Martoon-00 commented 1 year ago

I suppose this is a bit not how the parser was originally envisioned, and maybe we will go with something simpler instead of this issue.

For instance, if we assume this issue to be done, then the suggestion in #4 to account only for the edited parts in case of "Message edited" event becomes much harder to implement, we still have to process the entire message.

YuriRomanowski commented 1 year ago

I think, in such case we really should suggest the user to edit their message.

Our bot is going to become a cruel overseer :laughing:

But yes, it seems the task of analyzing natural text is very complicated, and we want at least to handle most common cases. But since we don't know what they are, we could for example try to collect some statistics and optimize our bot to work with the most probable cases (but I cannot imagine how exactly to do this)

Martoon-00 commented 1 year ago

That's a good concern.

I'd suggest gathering use cases in the following stages:

  1. Create a page (in Notion?) right now where we will be free to put examples of timestamps that have to be parsed.

    All them will be transformed into unit tests (or doctests) later.

    Extend the list of examples if some new examples come to mind during the work.

  2. At some point later brainstorm within the team. Try to get, like, 50 or 100 different examples overall.

    We should stay resilient to the temptation to do this step later, as otherwise our fresh ideas can be lost.

  3. Grep for timestamps in our Slack history, I believe we can find quite a lot there.

    I have access to over workspaces and can grep there.

  4. Invite one or two people from outside of the team to add more examples.

    Preferably someone who knows about potential caveats related to times.

YuriRomanowski commented 1 year ago

We can also ask our ML experts for some help, this problem seems like something they can help with.

Martoon-00 commented 1 year ago

That's interesting, actually excellent thought.

In my mind, when it comes to ML, I have "unexpected behaviour" and "works in 97% of the cases" associations, and if we manage to find a way to cover all the use cases with strict rules, better go with them.

But asking would indeed be worthy, I personally may be underestimating the complexity of this task.

Mm, let's probably invite those people into our Slack channel for this project. If you have difficulties with finding the right people, contact me, we will try to figure this out together.