standard-ai / ya-gcp

Apache License 2.0
7 stars 8 forks source link

First cut at bigtable support #15

Closed jneem closed 1 year ago

jneem commented 1 year ago

I've been poking at this for a while, and I think it's gotten to the point where it's worth asking whether I'm on the right track. Apart from wrapping the bigtable api, it does just enough of the bigtable table admin api to be able to run examples on the bigtable emulator.

The main missing things I know of are

jneem commented 1 year ago

Thanks for the comments and sorry for the slow response! I've had a busy week, but I think I can get another version (without the breaking version bumps) ready soon after the Thanksgiving break.

jneem commented 1 year ago

Ok, so the updated PR reverts the various version bumps, using disable_comments to work around the failing doctests.

Regarding changes to the connection abstraction, it would be cool to decouple auth/load-balancing/etc. From the point of view of authoring a new module I don't think it makes much difference either way: all I did was copy a few where clauses from another module and everything else just worked. I guess that means it should be easy to adapt to whatever changes are made to the connection abstraction.

jneem commented 1 year ago

Ok, the testing isn't as comprehensive as it should be (especially regarding retries), but at least there's something there so I'll mark is as non-draft.

If you know of any easy way of injecting (retryable) errors into the gRPC stream, I'd very much like to know :)

k1nkreet commented 1 year ago

Hey @rnarubin, @jneem. I wonder if I can help somehow to move this PR forward?

I see there are two CI failures, one is probably could be fixed with adding bigtable component installation alongside with pubsub-emulator in .github/workflows/ci.yaml. The other one is because it uses Vec::retain_mut available since 1.61.0 whereas CI configuration specifically checks tests passing with 1.59.0. I wonder if it means that this part should be rewritten without retain_mut or the minimal Rust version on CI could be updated?

rnarubin commented 1 year ago

Hi @k1nkreet. Yes fixing these tests is a great step.

  1. Fixing emulators in CI would be a big help. Even the pubsub emulator doesn't work (not sure why). In the workflow there are skipped tests because they use pubsub but there was something broken about interacting with it.
  2. Upgrading to 1.61 should be fine, it's several releases old by now
jneem commented 1 year ago

Thanks for the ping! I made a separate PR for the MSRV bump. I'm less sure what to do about the emulator thing. I guess I'll try just updating the CI to include it, but if that doesn't work I might need some help...

jneem commented 1 year ago

Thanks for the comments! I agree that the flushing is too permissive. If the bigtable server returns responses that violate the invariants, I guess the least bad option is to treat it as an unrecoverable error, right?

I will try to deal with the low-hanging fruit, and then I think maybe @k1nkreet will have some time to work on this? In any case I'll give him write access to the PR since I'll be off on Monday.

xlambein commented 1 year ago

Using this PR yesterday, I was in a situation where I had to call BigtableClient::mutate_rows directly, rather than go through e.g. set_rows_data. Doing this, I got a bigtable error saying the table name didn't exist, which I eventually found out was because methods like set_rows_data prepend a required prefix to the name:

https://github.com/standard-ai/ya-gcp/blob/b2e3ea7b0661c173fa20d521ec9f096ea939fb3d/src/bigtable/mod.rs#L554

The same is also true for the read_rows method.

IMO either this should be documented in mutate_row(s) and read_rows, or these lower-level methods should also prepend the prefix. In the case of the former, BigtableClient should have a method to construct a fully-qualified table name, because currently the prefix is private.

matthew-healy commented 1 year ago

@xlambein - Perhaps @jneem can correct me here, but it seems like the design intention was for mutate_rows to be 1:1 with the MutateRows rpc (which you can find documented here). As such it directly uses the proto-generated MutateRowsRequest type as its argument. That type has a table_name field which is expected to include the fully-qualified table name.

My initial feeling is that it'd be incongruous to still take that type as an argument but then opaquely change the table_name. Equally, the current state of affairs will clearly lead to mistakes.

I think the best way forward is to change the mutate_rows and mutate_row methods to no longer directly take the proto types as arguments, and then build the proto requests privately so that any call to the BigtableClient is always building the table_name from the prefix & supplied name.

That said, I'm interested to hear what @jneem and @rnarubin think of this.

(Edit: it's also likely this could be done in a follow-up PR so as not to keep this review open significantly longer.)

matthew-healy commented 1 year ago

Also @rnarubin - I think I've resolved all of the blocking comments, so have re-requested you review.

rnarubin commented 1 year ago

Re: mutate_rows vs set_row*

I agree with having a low-level function that matches the rpc API as closely as possible. Because there's a lot of fine-grained behavior defined in the message request, replicating that control in-library and then constructing the request internally would be difficult to maintain (probably lots of boilerplate, subtle behaviors). So i would advise keeping the low-level one with just the generated proto message request as an argument, for users who need that control.

To address @xlambein 's problem, adding a method to construct the fully specified table name is a good idea. It's a shame that the bigtable schema doesn't document the name format on the field in MutateRowsRequest, they do in the MutateRowRequest (note singular)

/// Required. The unique name of the table to which the mutations should be applied.

vs

/// Required. The unique name of the table to which the mutation should be applied.
/// Values are of the form
/// `projects/<project>/instances/<instance>/tables/<table>`.
jneem commented 1 year ago

I realized that I was wrong about the truncation of open intervals, so I pushed a fix.

matthew-healy commented 1 year ago

@rnarubin It's definitely valuable to have this merged, but as it's not currently a blocker for us I'm not sure it's necessary to rush it. I'm going to try and implement the conformance tests this week, and then I'll take a look at setting up a stubbed Bigtable server for the other test cases.

If, for whatever reason, this becomes a blocker, then we can reconsider.

matthew-healy commented 1 year ago

@rnarubin Hey! So I spent a bit of time last week working on the conformance tests. It took a little longer than I was hoping, but I managed to get them running. Unfortunately - as you predicted - they don't pass. In fact, the very first one fails because we seem to ignore the commitRow field in the CellChunk message. (I'm also having a bit of a hard time getting everything to implement UnwindSafe so I can catch the panics, so I don't yet have a good picture of how many of the tests fail, but I do know it's at least not all of them :sweat_smile: ).

That being said, I think the issues we have are mainly around poorly-behaved servers, as we're successfully using the branch without issues right now. Given the investment required to get this past alpha-maturity, I think it might be best if we do what you suggested & clean up what we have so we can merge this PR, and I'll open tickets for the remainder of the testing work.

I'll get started on the tasks you mentioned now and ping you for another review when they're done. I'll continue working on the conformance tests in-between other tasks and hopefully follow up with a PR in a few weeks.

matthew-healy commented 1 year ago

I've added Bigtable support (& its alpha status) to the README, and fixed the retry issue identified above. I also created #18 and #19 to cover the requested test cases.

I'm going to tidy up my (very rough) WIP branch & open a draft PR with the conformance tests to increase the bus factor here a little too. :bus: