spotify / spark-bigquery

Google BigQuery support for Spark, SQL, and DataFrames
Apache License 2.0
155 stars 52 forks source link

How does spotify transfer data from RDB to bigquery now? #64

Closed yu-iskw closed 6 years ago

yu-iskw commented 6 years ago

@nevillelyh

I'm just curious. How does spotify transfer data from RDB like MySQL to bigquery now? I know spark-biguqery is maitenance mode. That is, you might get another better way for that.

We centralize every data on bigquery. As well as we are still transfering data from MySQL to bigquery with scheduled jobs. As you know, transfering a huge table can be very tough without using distributed processing framework, such as spark. If you have any other better way to transfer data, would you please tell me that.

I really appriciate if you could answer my question, as far as you can tell me on github.

nevillelyh commented 6 years ago

You can probably write a Scio job with BQ and JDBC connectors? Also there're other tools for dumping SQL database, like sqoop which IIRC can dump MySQL tables as Avro files, which you can then load into Bigquery.

yu-iskw commented 6 years ago

Thank you for the advice. That is what I am thinking. Apache Beam or scio with BQ and JDBC connectors would be a great another way. And dumping it as avro would be good as well. I am really glad to know that, since what I am thinking is a right direction.