spotify / spark-bigquery

Google BigQuery support for Spark, SQL, and DataFrames
Apache License 2.0
155 stars 52 forks source link

Performance tune df.saveAsBigQueryTable #33

Open abhineet13 opened 7 years ago

abhineet13 commented 7 years ago

Hi

I am trying to load a csv zip file from google cloud into BQ, file size is 100 GB but the load is taking lot of time,
is there a way to tune the df.saveAsBigQueryTable command to speed up the loads

 val rowData = input.map(x => Row(x(0), x(1), x(2), x(3).toLong, x(4), x(5), x(6), x(7), x(8), x(9), x(10), x(11), x(12), x(13), x(14), x(15), x(16), x(17)))
      val df = sqlContext.createDataFrame(rowData, schemaTraffic)
      df.saveAsBigQueryTable(bqTrafficTable + partitionDate)
nevillelyh commented 7 years ago

Not sure if there's much we can do since we solely rely on spark-avro to translate DataFrames into Avro files on GCS and then BigQuery load feature. What's your bottleneck here? Make sure you have enough parallelism?