Open nyh opened 5 years ago
Recently we had a user who wanted to have DynamoDB's ReplicationLatency
and PendingReplicationCount
for global tables (see the old global tables API
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/globaltables_monitoring.html where both are available, the new API where only the former is available
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/V2globaltables_monitoring.html).
The ReplicationLatency
measures the amount of time from when an item became available in one DC until the time it became available on a remote DC (the measurement is per pair of DCs). It is not exactly specified what exactly this measures - does it measure the amount of time from when LOCAL_QUORUM was achieved on one DC until the first copy was written on the remote DC? Until LOCAL_QUORUM was achieve on the remote DC? Unill ALL was achieved on the remote DC?
We could also have (even though it's not entirely DynamoDB-compatible) a "replication latency" measurement for all tables, not just global tables, with a slightly different definition: The amount of time between CL was achieved and the time that all copies were written, if all were successful (not including in this statistics things like hints and repair).
Another related metric we could do (and I think DynamoDB also doesn't have) is GSI latency - the amount of time between returning a success for the base write, until the time the view wrote all or some of the updates. But this won't be easy, because the view replicas (who know when the view write complete) don't know when the base writes completed.
CC @amnonh @hopugop
It turns out that DynamoDB also has a metrics feature. The user can view graphs in "Cloudwatch" - https://console.aws.amazon.com/cloudwatch/home?region=us-east-1#cw:dashboard=DynamoDB and those have a similar feel as our Grafana graphs, though definitely not as diverse or powerful as Scylla.
The metrics available from DynamoDB are described in https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/metrics-dimensions.html
Even if we don't plan to accurately emulate AWS Cloudwatch, we should make an effort to at least support the same metrics which DynamoDB users may have come to expect. Those are listed in the above link, and we should probably implement them. We should also list somewhere (alternator.md) the list of DynamoDB metrics and the corresponding metric name (if implemented) in Alternator.
Note that almost all of DynamoDB's metrics can be queried per-table. Scylla supports per-table statistics but they are turned off by default. We should consider enabling them with Alternator use cases: per-table statistics were considered problematic when there are many tables, but DynamoDB encourages having only few tables because of its requirement to provision each table individually. We can either change the default when an Alternator port is open, or explain in altermator.md how the user should enable these metrics explicitly.
An interesting article on DynamoDB's metrics and how they can be useful to DynamoDB users: https://www.datadoghq.com/blog/top-dynamodb-performance-metrics/