Closed GlenDC closed 2 years ago
I've got a TODO that's basically samples for doing all_the_things using the managedwriter. I expect to get to that soon.
There's two issues as I can see in your report:
1) You're using proto3, which unfortunately does not work as expected with the write API currently, which expects proto2 semantics for processing the differences in null vs default values. This isn't the source of the error, but it is a significant gotcha.
2) You're using an un-normalized descriptor for the schema. The schema for the stream must be a self-contained descriptor, but you've got two descriptors: TemporaryDataProto3
and TemporaryDataParameterProto3
. Fortunately, there's a normalization function that will rewrite and consolidate the messages. It is present in the adapt
subpackage: https://pkg.go.dev/cloud.google.com/go/bigquery/storage/managedwriter/adapt#NormalizeDescriptor
Perfect, these 2 points that you noticed seem to be the main issues I was facing indeed. Now it works a lot better. What I do not know however is what I should use for known types such as TIMESTAMP
? In proto3
I was using import "google/protobuf/timestamp.proto";
, however that seems to give me the following errors when using a proto2 based model:
ready append resulted in error: rpc error: code = InvalidArgument desc = The proto field mismatched with BigQuery field at benchmark_TemporaryDataProto2.timestamp, the proto field type message, BigQuery field type TIMESTAMP Entity: projects/oi-bigquery/datasets/benchmarks_bqwriter/tables/tmp/_default
Having this documented could be nice as well I imagine.
The proto well-known types aren't yet properly supported, and Timestamp is among them. The public docs have a section on wire format conversions: https://cloud.google.com/bigquery/docs/write-api#data_type_conversions. Short answer: send an int64 with epoch-micros.
Alright perfect. Having that clearer in the documentation could be a nice bonus. Either way, this does answer my question and as such my last remaining question has been resolved.
I did find a lot of issues with type definitions in general. Another tricky one I found is how when using the batch client with auto-detected bigQuery schema that the casing somehow has to be correct, despite the field names being case insensitive. Yet you get duplicate field errors if you don't do so.
A lot of implicit knowledge is required. Perhaps that's unavoidable, but I do feel that a high level client such as managed writer, which already abstracts a lot of the implicit required knowledge away, could benefit from documenting quite extensively how types are expected and how not. Certainly not a deal breaker, but it would increase its value by a lot IMHO.
Either way, for what I'm concerned the issue can be closed as resolved. Thanks!
As a last note, please do know that I am willing to contribute to the BQ packages, so feel free to reach out to me, or mark issues with extensive details that others can pick up. Would be happy to help out as a token of gratitude.
It's great to get feedback on sharp edges for this API and the managedwriter abstraction, so thanks for being so clear and detailed. I just wanted to note I'll have limited availability for the next few weeks due to the holidays, so responses may be delayed.
Might be worth taking a look at https://github.com/googleapis/google-cloud-go/pull/5102; it proposes some further evolution of the AppendRows() method.
Closing this one for now. https://pkg.go.dev/cloud.google.com/go/bigquery/storage/managedwriter has an augmented godoc that covers sharp edges
@shollyman Thanks for your answer on this thread. Do you know if there is a plan to support the Timestamp proto type in the future by the backend team?
Client
bigquery/storage/managedwriter.Client
Environment
Go Environment
Code
Expected behavior
I would expect this to work and send the data to BQ
Actual behavior
Get errors:
I tried to fix it with a hack such as:
That doesn't work either, please I cannot imagine that it is ever the goal for us to having to create such a type descriptor ourselves, so lengthy and cryptic.
I got only so far by the way by following examples such as https://github.com/googleapis/google-cloud-go/blob/master/bigquery/storage/managedwriter/integration_test.go#L280 as the documentation was for me like chinese when it came to knowing how to pass on my data... Might be just me with my lack of experience with this entire stack, but never the less I believe the documentation can use a lot of work here.