samelamin / spark-bigquery

Google BigQuery support for Spark, Structured Streaming, SQL, and DataFrames with easy Databricks integration.
Apache License 2.0
70 stars 28 forks source link

DML query drop and create table takes time #70

Closed yogesh-0586 closed 5 years ago

yogesh-0586 commented 5 years ago

I tried to drop and create query using runDMLQuery() but for drop and create tables it takes more than 2 minutes please check following log:

19/02/11 12:33:03 INFO com.samelamin.spark.bigquery.BigQueryClient: Executing DML Statement DROP TABLE IF EXISTS `projectId.dataset.input`
19/02/11 12:33:03 INFO com.samelamin.spark.bigquery.BigQueryClient: Using legacy Sql: false
19/02/11 12:35:15 INFO com.samelamin.spark.bigquery.BigQueryClient: Executing DML Statement CREATE TABLE IF NOT EXISTS `projectId.dataset.input` (id STRING,evnt STRING)
19/02/11 12:35:15 INFO com.samelamin.spark.bigquery.BigQueryClient: Using legacy Sql: false

Is possible to create and drop table execute in less time?

yogesh-0586 commented 5 years ago

On spark-shell running DML query run fast, but on spark-submit running in yarn and cluster mode it took almost 2-3 minutes for single query

samelamin commented 5 years ago

This isn't a bug but more of a performance issue, behaviour is very different between a spark shell and spark submit depending on how you are using spark submit

Best to add some form of telemetry to this ticket to validate that it's the connector not yarn or spark adding this overhead

yogesh-0586 commented 5 years ago

@samelamin Thanks for your reply, I found that where was issues, it's from spark side