justinjm / googleCloudAutoMLTablesR

R package for interacting with Google Cloud AutoML Tables API
https://code.justinmarciszewski.me/googleCloudAutoMLTablesR
Other
4 stars 2 forks source link

API error in create dataset function "gcat_create_dataset()" #1

Open justinjm opened 5 years ago

justinjm commented 5 years ago

Summary

My goal is create a function to create a dataset in Google Cloud AutoML Tables. This function is in the AutoML Tables Python client library and used in a GCP tutorial and I'd like to emulate the functionality into an R package using googleAuthR as the framework for authentication and functions. Any hints or help from anyone would be much appreciated :)

Thank you in advance! Justin

Hypothesis

It's likely this error related to api request body being improperly formatted before passing into gar_api_generator(). Throughout my trial and error (checkout git history for more), I've gotten different errors of Invalid value for the same field dataset.dataset_metadata. It seems like I need to send an empty or null value in the create.dataset POST request.

Documentation links

What goes wrong

Can't create a dataset via gcat_create_dataset() fails with 400 error:

2019-07-08 14:38:43> Request Status Code: 400
2019-07-08 14:38:43> API returned error: List of found errors:  1.Field: dataset.dataset_metadata; Message: Required field not set.

Steps to reproduce the problem

  1. download and install googleCloudAutoMLTablesR from github
  2. run vignettes/quick_start.Rmd

Expected output

Referencing the CURL example from GCP documentation, here is the desired response

{
  "name": "projects/1234/locations/us-central1/datasets/TBL6543",
  "displayName": "sample_dataset",
  "createTime": "2018-12-14T19:07:57.141240Z",
  "etag": "AB3BwFq6UvGVkX64fx7z2Y4T4z-0jUQLKgFvvtD1RcZ2oikA=",
  "tablesDatasetMetadata": {
    "areStatsFresh": true
  }
}

Actual output

gcat_create_dataset() fails with 400 error

2019-07-08 14:38:43> Request Status Code: 400
2019-07-08 14:38:43> API returned error: List of found errors:  1.Field: dataset.dataset_metadata; Message: Required field not set.

'API Data failed to parse' diagnostics

> gcat_create_dataset(projectId = projectId,
+                     location = gcat_location,
+                     displayName = "test_02")
2019-07-10 14:08:45> No trailing slash in URL, adding it.
2019-07-10 14:08:45> Token exists.
2019-07-10 14:08:45> Valid local token
2019-07-10 14:08:45> Request: https://automl.googleapis.com/v1beta1/projects/XXXXXXXXXXXXXXXXXX/locations/us-central1/datasets/
2019-07-10 14:08:45> Body JSON parsed to: {"displayName":"test_02","tablesDatasetMetadata":{}}
2019-07-10 14:08:45> Written url, request_type and body_json to file 'request_debug.rds'.
                Use readRDS('request_debug.rds') to see it. 
-> POST /v1beta1/projects/XXXXXXXXXXXXXXXXXX/locations/us-central1/datasets/ HTTP/1.1
-> Host: automl.googleapis.com
-> User-Agent: googleAuthR/0.7.0 (gzip)
-> Accept: application/json, text/xml, application/xml, */*
-> Content-Type: application/json
-> Accept-Encoding: gzip
-> Authorization: Bearer 1234567890123456
-> Content-Length: 25
-> 
>> {"displayName":"test_02"}

<- HTTP/1.1 400 Bad Request
<- Vary: Origin
<- Vary: X-Origin
<- Vary: Referer
<- Content-Type: application/json; charset=UTF-8
<- Content-Encoding: gzip
<- Date: Wed, 10 Jul 2019 18:08:45 GMT
<- Server: ESF
<- Cache-Control: private
<- X-XSS-Protection: 0
<- X-Frame-Options: SAMEORIGIN
<- X-Content-Type-Options: nosniff
<- Alt-Svc: quic=":443"; ma=2592000; v="46,43,39"
<- Transfer-Encoding: chunked
<- 
Request failed [400]. Retrying in 1.7 seconds...
-> POST /v1beta1/projects/XXXXXXXXXXXXXXXXXX/locations/us-central1/datasets/ HTTP/1.1
-> Host: automl.googleapis.com
-> User-Agent: googleAuthR/0.7.0 (gzip)
-> Accept: application/json, text/xml, application/xml, */*
-> Content-Type: application/json
-> Accept-Encoding: gzip
-> Authorization: Bearer 1234567890123456
-> Content-Length: 25
-> 
>> {"displayName":"test_02"}

<- HTTP/1.1 400 Bad Request
<- Vary: Origin
<- Vary: X-Origin
<- Vary: Referer
<- Content-Type: application/json; charset=UTF-8
<- Content-Encoding: gzip
<- Date: Wed, 10 Jul 2019 18:08:47 GMT
<- Server: ESF
<- Cache-Control: private
<- X-XSS-Protection: 0
<- X-Frame-Options: SAMEORIGIN
<- X-Content-Type-Options: nosniff
<- Alt-Svc: quic=":443"; ma=2592000; v="46,43,39"
<- Transfer-Encoding: chunked
<- 
Request failed [400]. Retrying in 1 seconds...
-> POST /v1beta1/projects/XXXXXX/locations/us-central1/datasets/ HTTP/1.1
-> Host: automl.googleapis.com
-> User-Agent: googleAuthR/0.7.0 (gzip)
-> Accept: application/json, text/xml, application/xml, */*
-> Content-Type: application/json
-> Accept-Encoding: gzip
-> Authorization: Bearer 1234567890123456
-> Content-Length: 25
-> 
>> {"displayName":"test_02"}

<- HTTP/1.1 400 Bad Request
<- Vary: Origin
<- Vary: X-Origin
<- Vary: Referer
<- Content-Type: application/json; charset=UTF-8
<- Content-Encoding: gzip
<- Date: Wed, 10 Jul 2019 18:08:48 GMT
<- Server: ESF
<- Cache-Control: private
<- X-XSS-Protection: 0
<- X-Frame-Options: SAMEORIGIN
<- X-Content-Type-Options: nosniff
<- Alt-Svc: quic=":443"; ma=2592000; v="46,43,39"
<- Transfer-Encoding: chunked
<- 
2019-07-10 14:08:48> Request Status Code: 400
2019-07-10 14:08:48> API returned error: List of found errors:  1.Field: dataset.dataset_metadata; Message: Required field not set. 
2019-07-10 14:08:48> No retry attempted: List of found errors:  1.Field: dataset.dataset_metadata; Message: Required field not set. 
Scopes: https://www.googleapis.com/auth/cloud-platform
App key: 1234567890123456.apps.googleusercontent.com
Method: filepath
Error: API returned: List of found errors:  1.Field: dataset.dataset_metadata; Message: Required field not set. 

Debug info

> readRDS("request_debug.rds")
$url
[1] "https://automl.googleapis.com/v1beta1/projects/XXXXXXXXXXXXXXXXXX/locations/us-central1/datasets/"

$request_type
[1] "POST"

$body_json
{"displayName":"test_02","tablesDatasetMetadata":{}} 

Session Info

> sessionInfo()
R version 3.5.3 (2019-03-11)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.5

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] googleCloudAutoMLTablesR_0.0.0.9000 googleAuthR_0.7.0                  

loaded via a namespace (and not attached):
 [1] httr_1.4.0       compiler_3.5.3   R6_2.4.0         assertthat_0.2.1 tools_3.5.3     
 [6] curl_3.3         memoise_1.1.0    knitr_1.22       jsonlite_1.6     digest_0.6.18   
[11] xfun_0.6         packrat_0.5.0    openssl_1.3      askpass_1.1   

What I've Tried

I've tried the following in trying to format the api request body properly here: googleCloudAutoMLTablesR/datasets.R

I also tried to copy/paste the CURL api request body from GCP documentation to see if I was missing something

CURL example: Creating and managing datasets  |  AutoML Tables Documentation  |  Google Cloud

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
  -H "Content-Type: application/json" \
  -d '{
    "displayName": "dataset-display-name",
    "tablesDatasetMetadata": { },
  }' \
  https://automl.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/us-central1/datasets
> j <- jsonlite::fromJSON('{
+     "displayName": "dataset-display-name",
+     "tablesDatasetMetadata": { }
+   }')
> class(j)
[1] "list"
> str(j)
List of 2
 $ displayName          : chr "dataset-display-name"
 $ tablesDatasetMetadata: Named list()
MarkEdmondson1234 commented 5 years ago

It can be really fussy - the only difference I see is the response URL:

https://automl.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/us-central1/datasets

vs in your logs:

2019-07-10 14:08:45> Request: https://automl.googleapis.com/v1beta1/projects/XXXXXXXXXXXXXXXXXX/locations/us-central1/datasets/
2019-07-10 14:08:45> Body JSON parsed to: {"displayName":"test_02","tablesDatasetMetadata":{}}

e.g. trailing slash? You can turn that off by passing checkTrailingSlash = FALSE in gar_api_generator()

justinjm commented 5 years ago

Thank you for the guidance @MarkEdmondson1234, much appreciated and good catch :)

Initial attempt with your suggestion unfortunately yields the same error but will keep at it...

2019-07-10 23:53:52> Request Status Code: 400
2019-07-10 23:53:52> API returned error: List of found errors:  1.Field: dataset.dataset_metadata; Message: Required field not set. 
2019-07-10 23:53:52> No retry attempted: List of found errors:  1.Field: dataset.dataset_metadata; Message: Required field not set. 
Scopes: https://www.googleapis.com/auth/cloud-platform
App key: XXXXXXX.apps.googleusercontent.com
Method: filepath
Error: API returned: List of found errors:  1.Field: dataset.dataset_metadata; Message: Required field not set. 
> readRDS("request_debug.rds")
$url
[1] "https://automl.googleapis.com/v1beta1/projects/XXXXXXX/locations/us-central1/datasets"

$request_type
[1] "POST"

$body_json
{"displayName":"test_02","tablesDatasetMetadata":{}}
MarkEdmondson1234 commented 5 years ago

Ah the actual request is removing the tablesDatasetMetaData

-> POST /v1beta1/projects/XXXXXXXXXXXXXXXXXX/locations/us-central1/datasets/ HTTP/1.1
-> Host: automl.googleapis.com
-> User-Agent: googleAuthR/0.7.0 (gzip)
-> Accept: application/json, text/xml, application/xml, */*
-> Content-Type: application/json
-> Accept-Encoding: gzip
-> Authorization: Bearer 1234567890123456
-> Content-Length: 25
-> 
>> {"displayName":"test_02"}

Its actually really weird they ask you to send in an empty field to be valid. Can you try with sending in a space instead? e.g. " "

justinjm commented 5 years ago

@MarkEdmondson1234 Thank you for the continued help, really appreciate it!

I tried adding a space as suggested (2 different ways to be sure) and now getting a new - and perhaps encouraging since recognizing the tablesDatasetMetadata field - error:

Error

Error: API returned: Invalid JSON payload received. Unknown name "tablesDatasetMetadata" at 'dataset': Proto field is not repeating, cannot start list

Details

-> POST /v1beta1/projects/xxxxx/locations/us-central1/datasets HTTP/1.1
-> Host: automl.googleapis.com
-> User-Agent: googleAuthR/0.7.0 (gzip)
-> Accept: application/json, text/xml, application/xml, */*
-> Content-Type: application/json
-> Accept-Encoding: gzip
-> Authorization: Bearer xxxxx
-> Content-Length: 55
-> 
>> {"displayName":"test_02","tablesDatasetMetadata":[" "]}

<- HTTP/1.1 400 Bad Request
<- Vary: Origin
<- Vary: X-Origin
<- Vary: Referer
<- Content-Type: application/json; charset=UTF-8
<- Content-Encoding: gzip
<- Date: Thu, 11 Jul 2019 12:06:41 GMT
<- Server: ESF
<- Cache-Control: private
<- X-XSS-Protection: 0
<- X-Frame-Options: SAMEORIGIN
<- X-Content-Type-Options: nosniff
<- Alt-Svc: quic=":443"; ma=2592000; v="46,43,39"
<- Transfer-Encoding: chunked
<- 
2019-07-11 08:06:41> Request Status Code: 400
2019-07-11 08:06:41> API returned error: Invalid JSON payload received. Unknown name "tablesDatasetMetadata" at 'dataset': Proto field is not repeating, cannot start list.
2019-07-11 08:06:41> No retry attempted: Invalid JSON payload received. Unknown name "tablesDatasetMetadata" at 'dataset': Proto field is not repeating, cannot start list.
Scopes: https://www.googleapis.com/auth/cloud-platform
App key: xxxxxxxx.apps.googleusercontent.com
Method: filepath
Error: API returned: Invalid JSON payload received. Unknown name "tablesDatasetMetadata" at 'dataset': Proto field is not repeating, cannot start list.

So seems like the "tablesDatasetMetadata" field isn't a list object as API expects since it's "boxed in"? Strange to me since it's my understanding googleAuthR handles the unboxing in jsonlite? (here: googleAuthR/R/generator) Or am I missing something?

Or perhaps am I also misunderstanding how jsonlite handles empty objects? Although after finding this httr issue: , I was stumped on any alternative ways to create and pass the JSON object through googleAuthR!

MarkEdmondson1234 commented 5 years ago

Yes progres, you need jsonlite::unbox() to stop turning the entry into a list.

MarkEdmondson1234 commented 5 years ago

There is an example in googleLanguageR doing this, you apply unbox to the list object that is being turned into BODY eg.

jubox <- function(x) jsonlite::unbox(x) body <- list( document = list( type = jubox(type), language = jubox(language) ), encodingType = encodingType )

justinjm commented 5 years ago

Thank you for sharing the function from googleLanguageR! I was looking for something like it in the googleAuthRVerse and I missed that.

But AutoML still doesn't want to play nice after 2 more attempts :) Latest error:

-> POST /v1beta1/projects/xxxx/locations/us-central1/datasets HTTP/1.1
-> Host: automl.googleapis.com
-> User-Agent: googleAuthR/0.7.0 (gzip)
-> Accept: application/json, text/xml, application/xml, */*
-> Content-Type: application/json
-> Accept-Encoding: gzip
-> Authorization: Bearer xxxx
-> Content-Length: 55
-> 
>> {"displayName":"test_02","tablesDatasetMetadata":"{ }"}

<- HTTP/1.1 400 Bad Request
<- Vary: Origin
<- Vary: X-Origin
<- Vary: Referer
<- Content-Type: application/json; charset=UTF-8
<- Content-Encoding: gzip
<- Date: Thu, 11 Jul 2019 19:38:26 GMT
<- Server: ESF
<- Cache-Control: private
<- X-XSS-Protection: 0
<- X-Frame-Options: SAMEORIGIN
<- X-Content-Type-Options: nosniff
<- Alt-Svc: quic=":443"; ma=2592000; v="46,43,39"
<- Transfer-Encoding: chunked
<- 
2019-07-11 15:38:26> Request Status Code: 400
2019-07-11 15:38:26> API returned error: Invalid value at 'dataset.tables_dataset_metadata' (type.googleapis.com/google.cloud.automl.v1beta1.TablesDatasetMetadata), "{ }"
2019-07-11 15:38:26> No retry attempted: Invalid value at 'dataset.tables_dataset_metadata' (type.googleapis.com/google.cloud.automl.v1beta1.TablesDatasetMetadata), "{ }"
Scopes: https://www.googleapis.com/auth/cloud-platform
App key: xxxx.apps.googleusercontent.com
Method: filepath
Error: API returned: Invalid value at 'dataset.tables_dataset_metadata' (type.googleapis.com/google.cloud.automl.v1beta1.TablesDatasetMetadata), "{ }"
justinjm commented 4 years ago

Leaving this open, will focus on building out other functions. Non-programmatic and easy temporary solution: manually create a dataset in UI :)

justinjm commented 4 years ago

Looks like api moving out of beta; v1 released recently. Will revisit this at some point to see if new api version is more welcoming :)

https://cloud.google.com/automl/docs/reference/rest/v1/projects.locations.datasets/create

justinjm commented 4 years ago

Got a hack working! Next steps detailed in commit message: https://github.com/justinjm/googleCloudAutoMLTablesR/commit/7607075684ac3d0e88b340bf7de35358949c3a53