We've encountered several instances where the logUserID and viewID found in delivery RPCs don't match up with any logged client events. This is more common for viewID. This could be caused by the mobile client failing to send some events, but we have no way of knowing if this is the case in prod right now.
Currently, we batch events and deliver them every 10 seconds. However, if there's an interval of 10 seconds or longer in which no events are logged, then we don't send anything to the server. Because of this, we don't know whether any network disruptions have occurred.
Proposal
Send a heartbeat message every 10 seconds, even if nothing was logged.
Redundant, or insufficient incremental value over #130
As noted in the "Heartbeat" section of #130, that proposal can function as a heartbeat in that we can detect gaps in the serial number. However, that approach won't catch the case where the latest batch fails to send. This approach will allow us to infer cases where the latest batch fails, but doesn't offer any other incremental value over #130.
Is there enough value in doing this?
Sign off
Work begins when sign-off is received from all of the following:
Background
We've encountered several instances where the
logUserID
andviewID
found in delivery RPCs don't match up with any logged client events. This is more common forviewID
. This could be caused by the mobile client failing to send some events, but we have no way of knowing if this is the case in prod right now.Currently, we batch events and deliver them every 10 seconds. However, if there's an interval of 10 seconds or longer in which no events are logged, then we don't send anything to the server. Because of this, we don't know whether any network disruptions have occurred.
Proposal
Send a heartbeat message every 10 seconds, even if nothing was logged.
Redundant, or insufficient incremental value over #130
As noted in the "Heartbeat" section of #130, that proposal can function as a heartbeat in that we can detect gaps in the serial number. However, that approach won't catch the case where the latest batch fails to send. This approach will allow us to infer cases where the latest batch fails, but doesn't offer any other incremental value over #130.
Is there enough value in doing this?
Sign off
Work begins when sign-off is received from all of the following: