I would like to fix the logic to generate datasetId in BigQueryClient # stagingTable.
val datasetId = prefix + location.toLowerCase
Is the following modification possible? (The notation is java.)
String datasetId = prefix + location.toLowerCase (). ReplaceAll ("[^ a-z0-9 ] +", "")
Explain the situation.
I am considering using connectors to transfer data from BigQuery to the application on dataproc.
The data location we use is "asia-northeast1".
This is a string containing "-".
As a result, it seems that table creation fails when creating a temporary table like the following log.
{"loglevel": "INFO", "time": "2019-06-13 11: 21: 46.591", "appname": "job-executor", "function": "com.spotify.spark.bigquery.BigQueryClient .stagingDataset: 148 "," message ": Creating staging dataset repx-dev-jp-fiot-mgr: spark_bigquery_staging_asia-northeast1}
java.util.concurrent.ExecutionException: com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad Request
{
"code": 400,
"errors": [{
"domain": "global",
"message": "Invalid dataset ID \" spark_bigquery_staging_asia-northeast1 \ ". Dataset IDs must be alphanumeric (plus underscores) and must be at most 1024 characters long.",
"reason": "invalid"
}],
"message": "Invalid dataset ID \" spark_bigquery_staging_asia-northeast1 \ ". Dataset IDs must be alphanumeric (plus underscores) and must be at most 1024 characters long.",
"status": "INVALID_ARGUMENT"
}
at com.google.common.util.concurrent.AbstractFuture.getDoneValue (AbstractFuture.java:500)
at com.google.common.util.concurrent.AbstractFuture.get (AbstractFuture.java:459)
at com.google.common.util.concurrent.AbstractFuture $ TrustedFuture.get (AbstractFuture.java:76)
at com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly (Uninterruptibles.java:142)
at com.google.common.cache.LocalCache $ Segment.getAndRecordStats (LocalCache.java:2373)
at com.google.common.cache.LocalCache $ Segment.loadSync (LocalCache.java:2337)
at com.google.common.cache.LocalCache $ Segment.lockedGetOrLoad (LocalCache.java:2295)
at com.google.common.cache.LocalCache $ Segment.get (LocalCache.java:2208)
at com.google.common.cache.LocalCache.get (LocalCache.java:4053)
at com.google.common.cache.LocalCache.getOrLoad (LocalCache.java:4057)
at com.google.common.cache.LocalCache $ LocalLoadingCache.get (LocalCache.java:4986)
at com.spotify.spark.bigquery.BigQueryClient.query (BigQueryClient.scala: 105)
at com.spotify.spark.bigquery.BigQuerySQLContext.bigQuerySelect (BigQuerySQLContext.scala: 93)
I would like to fix the logic to generate datasetId in BigQueryClient # stagingTable. val datasetId = prefix + location.toLowerCase
Is the following modification possible? (The notation is java.) String datasetId = prefix + location.toLowerCase (). ReplaceAll ("[^ a-z0-9 ] +", "")
Explain the situation. I am considering using connectors to transfer data from BigQuery to the application on dataproc.
The data location we use is "asia-northeast1". This is a string containing "-".
As a result, it seems that table creation fails when creating a temporary table like the following log.
{"loglevel": "INFO", "time": "2019-06-13 11: 21: 46.591", "appname": "job-executor", "function": "com.spotify.spark.bigquery.BigQueryClient .stagingDataset: 148 "," message ": Creating staging dataset repx-dev-jp-fiot-mgr: spark_bigquery_staging_asia-northeast1} java.util.concurrent.ExecutionException: com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad Request { "code": 400, "errors": [{ "domain": "global", "message": "Invalid dataset ID \" spark_bigquery_staging_asia-northeast1 \ ". Dataset IDs must be alphanumeric (plus underscores) and must be at most 1024 characters long.", "reason": "invalid" }], "message": "Invalid dataset ID \" spark_bigquery_staging_asia-northeast1 \ ". Dataset IDs must be alphanumeric (plus underscores) and must be at most 1024 characters long.", "status": "INVALID_ARGUMENT" } at com.google.common.util.concurrent.AbstractFuture.getDoneValue (AbstractFuture.java:500) at com.google.common.util.concurrent.AbstractFuture.get (AbstractFuture.java:459) at com.google.common.util.concurrent.AbstractFuture $ TrustedFuture.get (AbstractFuture.java:76) at com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly (Uninterruptibles.java:142) at com.google.common.cache.LocalCache $ Segment.getAndRecordStats (LocalCache.java:2373) at com.google.common.cache.LocalCache $ Segment.loadSync (LocalCache.java:2337) at com.google.common.cache.LocalCache $ Segment.lockedGetOrLoad (LocalCache.java:2295) at com.google.common.cache.LocalCache $ Segment.get (LocalCache.java:2208) at com.google.common.cache.LocalCache.get (LocalCache.java:4053) at com.google.common.cache.LocalCache.getOrLoad (LocalCache.java:4057) at com.google.common.cache.LocalCache $ LocalLoadingCache.get (LocalCache.java:4986) at com.spotify.spark.bigquery.BigQueryClient.query (BigQueryClient.scala: 105) at com.spotify.spark.bigquery.BigQuerySQLContext.bigQuerySelect (BigQuerySQLContext.scala: 93)