Closed Martoon-00 closed 1 year ago
I suppose this is a bit not how the parser was originally envisioned, and maybe we will go with something simpler instead of this issue.
For instance, if we assume this issue to be done, then the suggestion in #4 to account only for the edited parts in case of "Message edited" event becomes much harder to implement, we still have to process the entire message.
I think, in such case we really should suggest the user to edit their message.
Our bot is going to become a cruel overseer :laughing:
But yes, it seems the task of analyzing natural text is very complicated, and we want at least to handle most common cases. But since we don't know what they are, we could for example try to collect some statistics and optimize our bot to work with the most probable cases (but I cannot imagine how exactly to do this)
That's a good concern.
I'd suggest gathering use cases in the following stages:
Create a page (in Notion?) right now where we will be free to put examples of timestamps that have to be parsed.
All them will be transformed into unit tests (or doctests) later.
Extend the list of examples if some new examples come to mind during the work.
At some point later brainstorm within the team. Try to get, like, 50 or 100 different examples overall.
We should stay resilient to the temptation to do this step later, as otherwise our fresh ideas can be lost.
Grep for timestamps in our Slack history, I believe we can find quite a lot there.
I have access to over workspaces and can grep there.
Invite one or two people from outside of the team to add more examples.
Preferably someone who knows about potential caveats related to times.
We can also ask our ML experts for some help, this problem seems like something they can help with.
That's interesting, actually excellent thought.
In my mind, when it comes to ML, I have "unexpected behaviour" and "works in 97% of the cases" associations, and if we manage to find a way to cover all the use cases with strict rules, better go with them.
But asking would indeed be worthy, I personally may be underestimating the complexity of this task.
Mm, let's probably invite those people into our Slack channel for this project. If you have difficulties with finding the right people, contact me, we will try to figure this out together.
Currently, we seem to handle only one mention of time per messageImplicit time parts problem
There seems to be an important nuance in the case when the message contains multiple time entries, imagine this text:
Here
Wednesday
applies both to10:00
and11:00
, and we should parse it accordingly.Another example:
This should parse as
On the one hand, formalizing such rules sounds like a pretty tricky task. On the other, sometimes such messages can be ambiguous for humans too, and ambiguity leads to miscommunication and wasted time, so maybe forcing some rules would not be that bad.
Btw, a common example of ambiguity:
11:00
used by people who live with 24h format would be ambigous for someone who got used toam
/pm
format.I think, in such case we really should suggest the user to edit their message. So having failing parsers is not a bad thing.
Finally, another bunch of examples:
Acceptance criteria