spring-attic / spring-cloud-gcp

Integration for Google Cloud Platform APIs with Spring
Apache License 2.0
704 stars 694 forks source link

BigQueryTemplate missing ability to set a custom schema #2576

Closed tnguyenpham-clgx closed 3 years ago

tnguyenpham-clgx commented 3 years ago

BigQueryTemplate provide an abstraction to insert new data into bigquery table with a default autodetect schema option. This is great in most cases, but there are cases we want to provide specific schema to bigquery write as autodect schema sometime do the wrong thing. For example, a column in bigquery table was created to be a String but when a new row is insert into table in csv format that field may be a string of number and auto schema detection think it is a number and create a wrong field type for this col, result in error writing to bigquery table.

Would like to have an ability to set a schema when call BigQueryTemplate. writeDataToTable() or when call BigQueryTemplate. setAutoDetectSchema().

Note that BigQueryTemplate. setAutoDetectSchema() documentation below saying that when AutoDetectSchema set to false, a schema must be defined and yet there is no option to set it. " /**

dzou commented 3 years ago

@tnguyenpham-clgx - Thanks for the report, will help look at this shortly. My current plan is to add an additional setting to writeDataToTable to let you specify the schema manually.

dzou commented 3 years ago

Hey I'm in the process of working on this here: https://github.com/GoogleCloudPlatform/spring-cloud-gcp/pull/108

Right now the PR is in the new repo but I plan on backporting it here.

dzou commented 3 years ago

For now if you want to work around this, I would suggest creating the table manually first via an autowired BigQuery object and then you can use the template to load as normal with autoDetectSchema = false.

Our BigQueryAutoconfiguration allows you to inject an instance of BigQuery into your code: https://docs.spring.io/spring-cloud-gcp/docs/1.2.5.RELEASE/reference/html/#bigquery-client-object

Once injected, you can create a table like this: https://cloud.google.com/bigquery/docs/tables#creating_an_empty_table_with_a_schema_definition

dzou commented 3 years ago

You should now be able to depend on this via. the snapshot version of spring-cloud-gcp 1.2.7.BUILD-SNAPSHOT.