Open coxley opened 11 months ago
@coxley sounds like a reasonable request. One question: Would it be sufficient to return a copy of the Ops
instead of handing over a reference? I'd worry about handing over an internal implementation detail to users to manipulate --> foot gun.
Feel free to send a PR for @bhshkh to review with this feature request, I'm just triaging open issues :)
@noahdietz That's a very fair and valid compromise. Will still allocate extra stuff, but avoid needing to reinvent types. (and only on invocation)
I'll try to take another look at this after vacation. :)
Hi,
I'm not sure if this is the best path forward. Cloud Bigtable has 2 features that are designed specifically for this usecase. Have you explored Audit Logs and Change Stream?
In general its not possible for clients to audit the changes accurately (ie using operations like CheckAndMutate and ReadModifyWrite can have non deterministic results if multiple clients are modifying the same row). Change streams is similar to WAL logs of SQL and designed for the explicit purpose of getting a very accurate view of whats stored in Bigtable. Audit logs is meant more for regulatory purposes, but contains the entire payload of the RPC
The primary reason I would like to avoid exposing the underlying protos is that by keeping our protocol distinct from our logical api we can evolve our protocol without breaking end users. If the 2 features I mentioned dont quite fit, I think having something like a generic grpc reply proxy would be a better path forward
@igorbernstein2: Thanks for taking a look!
Both "Audit Logs" and "Change Stream" aren't as local or hitless for the use-case we have.
For example, we want to take local recording action on processing a "request", and only part of that process involves interacting with BigTable. I'm not sure it's worth the added overhead to both our cluster and code complexity to use those methods. One way we use this data is to find regressions when modifying logic, to see if requests are applying the mutations we expect or not.
This is closer to recording a series of SQL skeletons that relates to some changes vs. the actual materialized changes.
Your explanation makes sense though — if this would make evolving the internals harder, I'm fine to drop the issue. If a future reader need something similar, we went with something akin to this while waiting for the PR:
type Action string
var (
Set Action = "set"
Delete Action = "delete"
Skip Action = "skip"
)
// Operation is one of potentially many actions taken on a mutation
type Operation struct {
Action Action
Family string
Column string
Annotation string
Start, End bigtable.Timestamp
}
// Mutation embeds bigtable.Mutation to record operations as they happen, for
// offline comparison.
type Mutation struct {
*bigtable.Mutation
// Key of the row that this mutation will be applied to
//
// Note: only used for comparison - it's up to the callers to apply
// mutations to the correct row
Key string
// We have logic to right-size this slice at creation time to avoid unneeded allocs
Ops []Operation
// Omitting business context specific fields
}
// Likewise for the other four methods
func (m *Mutation) Set(family, column string, ts bigtable.Timestamp, value []byte) {
m.Mutation.Set(family, column, ts, value)
m.Ops = append(m.Ops, Operation{
Action: Set,
Family: family,
Column: column,
})
}
Ref: https://github.com/googleapis/google-cloud-go/blob/main/bigtable/bigtable.go#L878
We have a handful of systems that make liberal use of BigTable. To make replaying, diagnosing, or dry-running previous events possible, I want to record the database decisions made for a given input.
With traditional databases, I'd record the rendered SQL template(s). I don't exactly have that with BigTable — but internally this library has it. Are the operations unexported explicitly, or just because this use-case hasn't come up?
To get around this, I'm going to embed
*bigtable.Mutation
and do my own recording. But our stuff can be high volume, and I'd love to not need to allocate extra things forever.It would be helpful if
mutation.Ops()
or similar returned the underlying value instead.