spotify / spark-bigquery

Google BigQuery support for Spark, SQL, and DataFrames
Apache License 2.0
155 stars 52 forks source link

Partition tables by day #25

Closed samelamin closed 7 years ago

samelamin commented 7 years ago

This PR was inspired by this issue

At the moment there is no way of partitioning by day via the job configuration, see here. Google engineers are ofcourse working on it

In the meantime us mortals will have to settle for using the client

ravwojdyla commented 7 years ago

Comments + should there be tests for this change?

samelamin commented 7 years ago

Thanks for the comments, ill make those changes then get back to your questions :)

samelamin commented 7 years ago

Updated with comments, might need your help on others though

samelamin commented 7 years ago

Also I cant see any tests in the codebase, am I missing something completely obvious?

I think adding the testing framework would be beyond the scope of this PR but happy to add it given some guidance on how you want things tested. Classic TDD vs London School

samelamin commented 7 years ago

@nevillelyh @ravwojdyla whats happening with this PR?

Would you like me to make some changes?

nevillelyh commented 7 years ago

I'd prefer if we can handle partition type (e.g. "DAY") & expiration time a little better, rather than changing again later.

nevillelyh commented 7 years ago

Thinking about this again. All we're doing is creating the table with native BQ client before loading data into it. What's stopping the user from doing so themselves outside of spark-bigquery? We can expose the BigQueryClient and some helper methods to make it easier. May be that's better than complicating the semantics of DataFrame methods?

samelamin commented 7 years ago

hahaha your too quick for me Neville! I was just replying about it being a bit cluttered in the interface

Yeah exposing the client would be a cleaner approach I agree

Exposing the client would be a big change, well bigger than whats currently here wouldnt it? Id be a bit uncomfortable making such a drastic change to the behaviour when there arent any tests

What do you think?

samelamin commented 7 years ago

Maybe pairing on it might be better?