Closed stIncMale closed 1 year ago
There were conversations in Slack related to this PR, but no consensus was achieved. @baileympearson, we need to move forward here somehow.
I am fine with either excluding from duration
the time to synchronously deliver ConnectionCheckOutStartedEvent
/ConnectionCreatedEvent
, which is the current behavior, or including it. But I recognize that duration
and waitQueueTimeoutMS
are related: if we don't start counting time towards them at the same instant, we expose ourselves to behaviors that are surprising to users. Following are two examples with large synchronous delivery time spans to make the surprising behaviors apparent.
1.a. EXCLUDE the time to synchronously deliver ConnectionCheckOutStartedEvent
from duration
. Start counting time towards waitQueueTimeoutMS
BEFORE emitting ConnectionCheckOutStartedEvent
.
waitQueueTimeoutMS
, which is 5 ms.ConnectionCheckOutStartedEvent
, which takes 100 ms.ConnectionCheckOutFailedEvent.duration
.duration
since step 3, which is 1 ms. Emit ConnectionCheckOutFailedEvent
with duration
1 ms.waitQueueTimeoutMS
(5 ms), but ConnectionCheckOutFailedEvent.duration
(1ms) tells a user that there were plenty of time left to wait for an available connection (5 ms - 1 ms = 4 ms).2.a. INCLUDE the time to synchronously deliver ConnectionCheckOutStartedEvent
to duration
. Start counting time towards waitQueueTimeoutMS
AFTER emitting ConnectionCheckOutStartedEvent
.
ConnectionCheckedOutEvent.duration
.ConnectionCheckOutStartedEvent
, which takes 100 ms.waitQueueTimeoutMS
, which is 5 ms.duration
since step 3, which is 1 ms. Emit ConnectionCheckedOutEvent
with duration
100 ms + 1 ms = 101 ms.duration
101 ms despite waitQueueTimeoutMS
being 5ms, and no new connections having been established. From this observation the user makes a conclusion that a driver did not respect waitQueueTimeoutMS
.While it is true that a driver does not provide hard real-time guarantees, the driver is still supposed to do its best to try and meet the specified timeouts. In these examples the driver failed to do its best, because there are ways to implement the functionality such that the user is spared from observing confusing data in executions as specified above.
The below options neither have the issues as in the examples above, nor do they have any new issues compared to those examples. Obviously, a driver still does not provide hard real-time guarantees, and a noticeable delay may happen at any point in execution, not only in user code when we synchronously deliver events. But a driver still must do its best, and the options below are better than the options chosen in the examples above:
1.b. EXCLUDE the time to synchronously deliver ConnectionCheckOutStartedEvent
from duration
. Start counting time towards waitQueueTimeoutMS
AFTER emitting ConnectionCheckOutStartedEvent
(this is the current state of this PR).
2.b. INCLUDE the time to synchronously deliver ConnectionCheckOutStartedEvent
to duration
. Start counting time towards waitQueueTimeoutMS
BEFORE emitting ConnectionCheckOutStartedEvent
.
Both of these options allow a driver to start counting time towards duration
and waitQueueTimeoutMS
at the same instant, and the spec will require this behavior if we choose either option 1.b or 2.b.
@stIncMale
As a side note - I don't think it's correct that duration and waitQueueTimeoutMS are related and that they must be aligned. It's already the case that no matter what we do with event delivery, waitQueueTimeoutMS
and duration
can have very different values. waitQueueTimeoutMS
should not govern connection establishment, and consequently in drivers that establish connections synchronously when performing connection checkout ConnectionCheckoutSucceeded.duration
could be significantly greater than waitQueueTimeoutMS
. This is irrespective of synchronous / asynchronous event processing.
To answer your question from slack (which is in reference to my suggestion that duration
be the difference between construction times of ConnectionCheckoutStartedEvent
s and ConnectionCheckoutSucceeded/failedEvent
s):
The above means that the duration of synchronous delivery of ConnectionCheckoutStarted is counted towards ConnectionCheckoutSucceeded.duration (option 1). Could you elaborate how the above avoids the discussion around waitQueueTimeoutMS?
I interpret the intention behind DRIVERS-2469 as: users have requested a way to determine how long connection checkout takes so they can optimize their application's consumption of driver connection pools. In drivers where events are emitted synchronously, this can impact pool performance. So, any synchronous event process definitely should be included in the new measurement.
Standardizing and mandating how drivers handle synchronous emissions of events is bad because:
waitQueueTimeoutMS
and ConnectionCheckoutStartedEvent
s in line with whatever this PR defines as the "correct" orderingThe simplest way to measure that duration for all drivers is to take the difference between the when drivers construct ConnectionCheckoutStartedEvent
and when drivers construct ConnectionCheckoutSucceededEvent
.
This doesn't clarify in the spec whether or not the durations of synchronous event emissions should be included or not. However, it does
This also closely matches my intuition of "matching" corresponding checkout started and checkout succeeded events and subtracting timestamps, which I think is how many users might view the new "duration" property as well. And specifically for Node, we actually have a timestamp on every pool event, so a user might actually be confused as to why the duration
property is significantly different from ConnectionCheckoutSucceeded - ConnectionCheckoutStarted
.
Given the options outlined in proceeding comment, I strongly lean towards option 1.b. If we do commit to behavior here, we should test it (probably with an optional prose test for drivers that emit events synchronously).
1.b. EXCLUDE the time to synchronously deliver ConnectionCheckOutStartedEvent from duration. Start counting time towards waitQueueTimeoutMS AFTER emitting ConnectionCheckOutStartedEvent (this is the current state of this PR).
DRIVERS-2707
@baileympearson, @matthewdale I added you as reviewers because you approved the PR where I introduced the wording that this PR changes.
Please complete the following before merging: