Open data-sync-user opened 1 month ago
➤ JR Conlin commented:
Going to use this ticket to collect up various comments and actions around the creation of the Reliability Data store system. (cc: Eric Maydeck Rachael Crook Philip Jenvey Taddes Korris )
https://mozilla-hub.atlassian.net/browse/SYNC-4325?focusedCommentId=946578 ( https://mozilla-hub.atlassian.net/browse/SYNC-4325?focusedCommentId=946578|smart-link ) contains the discussion of the schema and system design.
Expected LoadWe are only tracking a subset of messages that have a known Public Key (FxA tab operations). It’s worth noting that the overall percentage of total messages that meet this criteria is unknown, but less than 100%.
OperationsEach tracked message will create a Redis pipeline command set containing the following:
In addition, the app will record to Bigtable at a row identified by the reliability_id a cell with the qualifier of the milestone and value of the timestamp as well as a cell marked error with any message failure (after the message is accepted) indicating loss.
An example of a JSON formatted Bigtable entry for a message that expired while in storage might look like: {"DEADBEEF...":{"received":17285143030001,"stored":17285143030010,"expired":1728589102, "error":"expired"}}
where a successfully transmitted message may look like:
{"DEADBEEF...":{"received":17285143030001,"transmitted":17285143030010,"accepted":1728589102}}
(Question: I am debating whether we should add cells for the message TTL as well as the total time a message spent in transit. I’m not sure of the general utility of those, though, since a message with a “too long” TTL isn’t particularly interesting, a message with a “too short” would just expire in transit, and the total time a message spent in system could be determined roughly by looking at the timestamps.)
Because Redis does not have a way to automatically decrement counters based on some expiration criteria, we need to have a “reaper” process that looks at the values included in the zadd for any that are less than the current timestamp, and decrements the "counts" for that $milestone, while recording the fate of the message into Bigtable’s log.
ReaperWhile it’s possible to create reaper processes within the Autopush applications that use a complex set of lock deciders, I believe it’s just simpler to have the reaper be a single, reasonably simple, external application which regularly checks the Redis storage for expired records, adjusts the counts, and logs the fate of the messages to Bigtable. (No cleanup for Bigtable is required, since records will automatically age out.)
Generate the Message Tracking Event database.
Per https://mozilla-hub.atlassian.net/browse/SYNC-4325?focusedCommentId=946578 there are two data storage systems in play. One being a Redis like storage system (probably GCP Memorystore system), and a modification to the existing Bigtable schema to include a new “reliability” column family (set to
maxage=60 or maxversions=1
) This would contain the milestone log messages. The expiration date of 60 days allows for twice the max message age, which should allow for more than long enough for any sort of large observation.┆Issue is synchronized with this Jira Task