googleapis / nodejs-bigquery-storage

BigQuery Storage Node.js client
Apache License 2.0
32 stars 16 forks source link

Write API example? #222

Closed imathews closed 2 years ago

imathews commented 2 years ago

I saw that this library was recently updated to support the new Storage Write API — would it be possible to publish a simple example for streaming rows into a table, similar to what we have for the Read API?

Thanks!

mbyrne00 commented 2 years ago

Yes - tried to reverse-engineer the latest commit to add the functionality but it's not clear how to specify the write stream type (e.g. PENDING), write some rows and batch commit. The underlying api in shows some java examples.

The code committed and tests seem to all be very isolated with mocks/stubs. Very hard to see how to piece it all together.

Would be great to see a simple example that

  1. Creates a stream
  2. Writes some records
  3. Batch commits
mbyrne00 commented 2 years ago

Also asked here, just in case: https://stackoverflow.com/questions/69793756/write-rows-to-bigquery-via-nodejs-bigquery-storage-write-api?noredirect=1#comment123374327_69793756

rayeeskm commented 2 years ago

Have a look at the example i have given for the question raised in https://stackoverflow.com/questions/69793756/write-rows-to-bigquery-via-nodejs-bigquery-storage-write-api

mbyrne00 commented 2 years ago

Hey @rayeeskm - thanks very much for the example and this answers our question.

On the API itself, I just had some feedback - happy to raise another issue if that's better:

Thanks again for the example meantime and this would certainly unblock the path to giving it a go.

rayeeskm commented 2 years ago

Hey @mbyrne00, yes those enums are available in the library itself. I have updated the example code how to get that from the library.

On the second point of reducing the complexity in defining the proto schema, am yet to figure if this library has any in built mechanism to do that.

gal1419 commented 2 years ago

@rayeeskm @mbyrne00 I haven't checked it myself yet, but in case it will save you some time - the AppendRowsRequest seems to have a static fromObject method that converts a plain JS object into AppendRowsRequest;

I'll do some testing and will post back.

` /**

gal1419 commented 2 years ago

Update: the above does not work

s1moe2 commented 2 years ago

Examples would be awesome! I'm having a hard time understanding why I keep getting a permissions error, even when following the stackoverflow example mentioned above. Permission 'TABLES_UPDATE_DATA' denied on resource 'projects/XXX/datasets/university/tables/students' (or it may not exist).

I'm executing this locally with a service account key which has the right permissions, according to this guide.

I've actually activated the service account key in gcloud and inserted via the CLI:

> cat st.json 
[{ "id": 1, "name": "homer", "age": 50 }]

> gcloud alpha bq tables insert students --dataset=university --data=st.json
kind: bigquery#tableDataInsertAllResponse

image

gal1419 commented 2 years ago

@s1moe2 this is the exact error I'm also getting. I verified that my service account is configured properly.

rayeeskm commented 2 years ago

https://cloud.google.com/docs/authentication/production#linux-or-macos

@s1moe2 @gal1419

s1moe2 commented 2 years ago

@rayeeskm This doesn't seem to be the problem. I'm passing the credentials via environment variable, which should work as well.

When specifying projectId/keyPath, despite getting the error code 7, the error itself is now different (seems more nested maybe):

Error: 7 PERMISSION_DENIED: Permission denied on resource project [object Object]

(note that [object Object] is already part of the error string message...)

gal1419 commented 2 years ago

I use the keyFilename option which is legit, I use it also in @google-cloud/bigquery and it works as expected. It seems like the authentication phase passes because if I specify wrong values I get authentication errors. The error occurs on stream.write and I get:

Error: 7 PERMISSION_DENIED: Permission 'TABLES_UPDATE_DATA' denied on resource 'projects/project_id/datasets/dataset_id/tables/ds_connections' (or it may not exist).

abcde090 commented 2 years ago

I am also getting this Error: 7 PERMISSION_DENIED: Permission 'TABLES_UPDATE_DATA' denied on resource. Have anyone solved this issue already ?

gal1419 commented 2 years ago

@abcde090 I didn't manage to solve it. I'm starting to look at other alternatives. It seems like a bug to me. @rayeeskm @mbyrne00 - can you please confirm that this is actually working for you?

@steffnay your comment will be much appreciated, thank you in advance.

s1moe2 commented 2 years ago

I didn't get this to work either. I'll gladly go back to this and open a bug issue so we can compile all the necessary information. Didn't do it before because it seemed like some people had this working, so I've been tracking this issue in case it was me doing some dumb thing.

rayeeskm commented 2 years ago

@gal1419 Yes it works pretty fine.

mbyrne00 commented 2 years ago

I didn’t get this working, sorry. I had to proceed with ingestion via GCS

s1moe2 commented 2 years ago

Edit: moved this to bug issue https://github.com/googleapis/nodejs-bigquery-storage/issues/227

steffnay commented 2 years ago

@gal1419 @mbyrne00 @abcde090 I'm looking into this. Can you all let me know whether you are encountering this on the free tier or with a project that has billing?

gal1419 commented 2 years ago

@steffnay answered in #227. but for the record - my project has billing configured. thanks for your help!

abcde090 commented 2 years ago

I have solve this Error: 7 PERMISSION_DENIED: Permission 'TABLES_UPDATE_DATA' denied on resource issue. I am not sure why it happened but I created a new dataset in a different location and then it works. My old dataset is located in australia-southeast1 and my new dataset is in US. It still throws errors on my old dataset in australia-southeast1 @gal1419 @steffnay @s1moe2

steffnay commented 2 years ago

@abcde090 Ah, thank you! I was able to reproduce last night when I changed location and wanted to inquire about which location you all were using, so your update is very helpful.

s1moe2 commented 2 years ago

@steffnay did you see my last comment on #227 regarding this?

tswast commented 2 years ago

For non-default locations, you need to populate the x-goog-request-params header with the value write_stream=your_stream_name.

See: https://github.com/googleapis/python-bigquery-storage/blob/ce63994e08091be06b75d4010f2eefb424dbb356/google/cloud/bigquery_storage_v1/writer.py#L165-L172 in Python