Problem Statement: While doing an initial analysis of error occurring on the testnet I encountered a couple of areas where what we are logging can be improved pretty easily.
Harden up the errors we're reporting into clearly defined categories.
I'm not sure of the best way to slice these but a more consistent taxonomy would be good. For example, these are the errors I encountered bucketed:
Network/Timeout Issues:
error opening stream: failed to open stream: timed out: context deadline exceeded
error reading response length: stream reset
error reading response length: timeout: no recent network activity
operation failed after 3 attempts
work execution timed out
Rate Limiting Issues:
all accounts are rate-limited
response status 429 Too Many Requests: Rate limit exceeded
response status 429 Too Many Requests: {"code":88,"message":"Rate limit exceeded."}
Authentication Issues:
Twitter authentication failed for [user]
Protocol Negotiation Issues:
error opening stream: failed to negotiate protocol: protocols not supported: [/masa/worker_protocolā¦]
error opening stream: failed to negotiate protocol: context deadline exceeded
error opening stream: failed to negotiate protocol: stream reset
Worker Issues:
no remote workers available
all remote workers failed
Collect Unique Identifier
The protocol already has a unique identifier for work threads. We should be storing this so it's easier to trace a request's lifecycle.
Questions we want answered:
from whom did it originate?
to whom was the request sent? (and further, how many times was it rerouted?)
what does the topography of our network look like?
Acceptance Criteria
[ ] errors from the protocol are categorized and deduplicated.
[ ] our event's table is modified to accommodate a filed for a unique identifier for each work thread.
[ ] unique work ids are sent with the events coming into the event table.
Problem Statement: While doing an initial analysis of error occurring on the testnet I encountered a couple of areas where what we are logging can be improved pretty easily.
Harden up the errors we're reporting into clearly defined categories.
I'm not sure of the best way to slice these but a more consistent taxonomy would be good. For example, these are the errors I encountered bucketed:
Network/Timeout Issues:
error opening stream: failed to open stream: timed out: context deadline exceeded
error reading response length: stream reset
error reading response length: timeout: no recent network activity
operation failed after 3 attempts
work execution timed out
Rate Limiting Issues:
all accounts are rate-limited
response status 429 Too Many Requests: Rate limit exceeded
response status 429 Too Many Requests: {"code":88,"message":"Rate limit exceeded."}
Authentication Issues:
Twitter authentication failed for [user]
Protocol Negotiation Issues:
error opening stream: failed to negotiate protocol: protocols not supported: [/masa/worker_protocolā¦]
error opening stream: protocol /masa/worker_protocol failed
error opening stream: failed to negotiate protocol: context deadline exceeded
error opening stream: failed to negotiate protocol: stream reset
Worker Issues:
no remote workers available
all remote workers failed
Collect Unique Identifier
The protocol already has a unique identifier for work threads. We should be storing this so it's easier to trace a request's lifecycle. Questions we want answered:
Acceptance Criteria