spring-projects / spring-batch-extensions

Spring Batch Extensions
242 stars 258 forks source link

Introduce Spring Batch Google BigQuery writer #55

Closed dgray16 closed 3 years ago

dgray16 commented 4 years ago

Hi. I want to propose you implementation of BigQuery ItemWriter. https://cloud.google.com/bigquery

On project worked on I was actively using Spring Batch + BigQuery and I managed to solve some non trivial issues with such combination, so I decided to share it with community. I am not part of Google team and I am not an expert in Google BigQuery Java library, so implementation is fully driven by community.

This approach based on BigQuery load jobs. Implementation tries to be as flexible as possible. So if you know some not implemented cases feel free to write me.

Supported formats:

Not supported formats:

One more thing to tell about ORC & Parquet. I have an idea to make it work with Spring Batch. Instead of data manipulation we can manipulate with files. So batch processing will operate with files rather then with data in files. I left method in code that works this way to clarify if somebody need it or not.

dgray16 commented 4 years ago

By the way, how do you expect this code to be used by somebody else? I have not found any jar from this repository in Maven central.

dgray16 commented 4 years ago

Copy of https://github.com/spring-projects/spring-batch/pull/3676

fmbenhassine commented 3 years ago

Thank you for your PR. This is interesting. I see this BigQueryItemWriter coupled with a BigQueryItemReader as good candidates for a new module called spring-batch-bigquery to support Google's BigQuery.

By the way, how do you expect this code to be used by somebody else? I have not found any jar from this repository in Maven central.

We are planning to start releasing community driven modules to Maven Central and we are looking for a lead for each module. We are not intending to merge PRs for code that won't be maintained by a community lead. Are you interested in taking the lead on this spring-batch-bigquery module? You will be responsible for all issues/PRs related to that module and the Spring Batch team will take care of the releases.

If you are interested, please reach out to me or @mminella and we will take it from there.

dgray16 commented 3 years ago

If you are interested, please reach out to me or @mminella and we will take it from there.

It is an honour for me to try.

fmbenhassine commented 3 years ago

@dgray16

It is an honour for me to try.

That's a great news! Thank you! As a starting point, could you please:

Once this is done, please add a comment here and I will get in touch with you by email to explain the process of taking the lead on this BigQuery module.

dgray16 commented 3 years ago

I have signed and agree to the terms of the SpringSource Individual Contributor License Agreement.

dgray16 commented 3 years ago

@benas Done

fmbenhassine commented 3 years ago

@dgray16 I've added a couple of minor remarks. I can take care of these changes when merging your PR, but I just wanted to have your feedback on the Java version (is 15 required?) and the usage of @Deprecated on private method. Other than that, the PR LGTM 👍

I have signed and agree to the terms of the SpringSource Individual Contributor License Agreement.

Could you please re-sign the CLA? We had an issue with the bot that automatically checks the signing. Sorry for the inconvenience.

dgray16 commented 3 years ago

@benas I have signed and agree to the terms of the SpringSource Individual Contributor License Agreement.

fmbenhassine commented 3 years ago

LGTM now 👍 Thank you for all these updates!

The maven's groupId of the module should be org.springframework.batch.extensions and the base package should be org.springframework.batch.extensions.bigquery. I will update those on merge.