Open acmiyaguchi opened 5 years ago
This branch implements the basic logic for this workflow: https://github.com/acmiyaguchi/jsonschema-transpiler/tree/validation-roundtrip
In particular, this script will generate avro files, upload them to gcs, load them into BigQuery, and then fetch the results:
$ ./scripts/generate-bq-schema.sh
4/4 succeeded
Copying file://avro-data/test-array.avro [Content-Type=application/octet-stream]...
Copying file://avro-data/test-long.avro [Content-Type=application/octet-stream]...
Copying file://avro-data/test-map.avro [Content-Type=application/octet-stream]...
Copying file://avro-data/test-string.avro [Content-Type=application/octet-stream]...
\ [4 files][ 420.0 B/ 420.0 B]
Operation completed over 4 objects/420.0 B.
Dataset 'amiyaguchi-unmap-json-bq-sink:test_bq_schema' successfully created.
Waiting on bqjob_r1874d3f3ac121bff_0000016a75410c76_1 ... (3s) Current status: DONE
original avro schema:
{
"items": {
"type": "long"
},
"type": "array"
}
bigquery schema:
[
{
"type": "INTEGER",
"name": "root",
"mode": "REPEATED"
}
]
Waiting on bqjob_r7aea09933351cacd_0000016a75412b0a_1 ... (2s) Current status: DONE
original avro schema:
{
"type": "string"
}
bigquery schema:
[
{
"type": "STRING",
"name": "root",
"mode": "REQUIRED"
}
]
Waiting on bqjob_r2626f2a8102a662b_0000016a75414269_1 ... (2s) Current status: DONE
original avro schema:
{
"type": "long"
}
bigquery schema:
[
{
"type": "INTEGER",
"name": "root",
"mode": "REQUIRED"
}
]
Waiting on bqjob_r740102a4f0834588_0000016a754158bd_1 ... (3s) Current status: DONE
original avro schema:
{
"type": "map",
"values": {
"type": "long"
}
}
bigquery schema:
[
{
"fields": [
{
"type": "STRING",
"name": "key",
"mode": "REQUIRED"
},
{
"type": "INTEGER",
"name": "value",
"mode": "REQUIRED"
}
],
"type": "RECORD",
"name": "root",
"mode": "REPEATED"
}
]
Removing gs://amiyaguchi-unmap-json-bq-sink/test_bq_schema/test-array.avro#1556745355194021...
Removing gs://amiyaguchi-unmap-json-bq-sink/test_bq_schema/test-long.avro#1556745355610474...
Removing gs://amiyaguchi-unmap-json-bq-sink/test_bq_schema/test-map.avro#1556745355937243...
Removing gs://amiyaguchi-unmap-json-bq-sink/test_bq_schema/test-string.avro#1556745356199443...
/ [4 objects]
Operation completed over 4 objects.
It would be nice to generate data-sets for tests/resources
so the resulting BigQuery schemas are validated.
With support for data in integration tests in #62, we can start to validate BigQuery schemas against the BigQuery itself. It should be done in the following way:
bq load
bq show --format=prettyjson