Closed yu-iskw closed 6 years ago
You could give it a shot. But I'm not sure the timestamp support alone justifies the cost.
Thank you for the comment. You make good points. We should miss the pros of avro against only the timestamp type benefit. We can also modify spark-avro itself. I think it would be better for our case.
Hi @nevillelyh
We use avro format to save a dataframe to GCS before loading the avro files to bigquery. One of the biggest advantages of avro format is that bigquery can read the schema from the avro metadata. However, avro format doesn't support timestamp type. So we need a twists to store a dataframe which includes timestamp columns.
I guess parquet support timestamp type. Moreover, if bigquery can load parquet files on GCS without explicit schema, it would be better to use parquet format. What do you think?
https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-parquet