mozilla / jsonschema-transpiler

Compile JSON Schema into Avro and BigQuery schemas
Mozilla Public License 2.0
43 stars 9 forks source link

Consider supporting legacy BigQuery types #74

Open fbertsch opened 5 years ago

fbertsch commented 5 years ago

For example, we use BOOL but the canonical form is BOOLEAN. This causes some headache for testing if there has been a change in the file.

whd commented 5 years ago

An example follows. Google's version: "[{\"mode\":\"REPEATED\",\"name\":\"accessibilityServices\",\"type\":\"STRING\"},{\"mode\":\"NULLABLE\",\"name\":\"additional_properties\",\"type\":\"STRING\"},{\"mode\":\"REQUIRED\",\"name\":\"arch\",\"type\":\"STRING\"},{\"mode\":\"NULLABLE\",\"name\":\"bug_1501329_affected\",\"type\":\"BOOLEAN\"},{\"mode\":\"NULLABLE\",\"name\":\"campaign\",\"type\":\"STRING\"},{\"mode\":\"REQUIRED\",\"name\":\"clientId\",\"type\":\"STRING\"},{\"mode\":\"NULLABLE\",\"name\":\"created\",\"type\":\"STRING\"},{\"mode\":\"NULLABLE\",\"name\":\"defaultBrowser\",\"type\":\"BOOLEAN\"},{\"mode\":\"NULLABLE\",\"name\":\"defaultSearch\",\"type\":\"STRING\"},{\"mode\":\"REQUIRED\",\"name\":\"device\",\"type\":\"STRING\"},{\"mode\":\"NULLABLE\",\"name\":\"displayVersion\",\"type\":\"STRING\"},{\"mode\":\"NULLABLE\",\"name\":\"distributionId\",\"type\":\"STRING\"},{\"mode\":\"NULLABLE\",\"name\":\"document_id\",\"type\":\"STRING\"},{\"mode\":\"NULLABLE\",\"name\":\"durations\",\"type\":\"INTEGER\"},{\"mode\":\"REPEATED\",\"name\":\"experiments\",\"type\":\"STRING\"},{\"mode\":\"NULLABLE\",\"name\":\"flashUsage\",\"type\":\"INTEGER\"},{\"mode\":\"REQUIRED\",\"name\":\"locale\",\"type\":\"STRING\"},{\"fields\":[{\"fields\":[{\"mode\":\"NULLABLE\",\"name\":\"city\",\"type\":\"STRING\"},{\"mode\":\"NULLABLE\",\"name\":\"country\",\"type\":\"STRING\"},{\"mode\":\"NULLABLE\",\"name\":\"db_version\",\"type\":\"STRING\"},{\"mode\":\"NULLABLE\",\"name\":\"subdivision1\",\"type\":\"STRING\"},{\"mode\":\"NULLABLE\",\"name\":\"subdivision2\",\"type\":\"STRING\"}],\"mode\":\"NULLABLE\",\"name\":\"geo\",\"type\":\"RECORD\"},{\"fields\":[{\"mode\":\"NULLABLE\",\"name\":\"date\",\"type\":\"STRING\"},{\"mode\":\"NULLABLE\",\"name\":\"dnt\",\"type\":\"STRING\"},{\"mode\":\"NULLABLE\",\"name\":\"x_debug_id\",\"type\":\"STRING\"},{\"mode\":\"NULLABLE\",\"name\":\"x_pingsender_version\",\"type\":\"STRING\"}],\"mode\":\"NULLABLE\",\"name\":\"header\",\"type\":\"RECORD\"},{\"fields\":[{\"mode\":\"NULLABLE\",\"name\":\"app_build_id\",\"type\":\"STRING\"},{\"mode\":\"NULLABLE\",\"name\":\"app_name\",\"type\":\"STRING\"},{\"mode\":\"NULLABLE\",\"name\":\"app_update_channel\",\"type\":\"STRING\"},{\"mode\":\"NULLABLE\",\"name\":\"app_version\",\"type\":\"STRING\"}],\"mode\":\"NULLABLE\",\"name\":\"uri\",\"type\":\"RECORD\"},{\"fields\":[{\"mode\":\"NULLABLE\",\"name\":\"browser\",\"type\":\"STRING\"},{\"mode\":\"NULLABLE\",\"name\":\"os\",\"type\":\"STRING\"},{\"mode\":\"NULLABLE\",\"name\":\"version\",\"type\":\"STRING\"}],\"mode\":\"NULLABLE\",\"name\":\"user_agent\",\"type\":\"RECORD\"}],\"mode\":\"REQUIRED\",\"name\":\"metadata\",\"type\":\"RECORD\"},{\"mode\":\"NULLABLE\",\"name\":\"normalized_app_name\",\"type\":\"STRING\"},{\"mode\":\"NULLABLE\",\"name\":\"normalized_channel\",\"type\":\"STRING\"},{\"mode\":\"NULLABLE\",\"name\":\"normalized_country_code\",\"type\":\"STRING\"},{\"mode\":\"NULLABLE\",\"name\":\"normalized_os\",\"type\":\"STRING\"},{\"mode\":\"NULLABLE\",\"name\":\"normalized_os_version\",\"type\":\"STRING\"},{\"mode\":\"REQUIRED\",\"name\":\"os\",\"type\":\"STRING\"},{\"mode\":\"REQUIRED\",\"name\":\"osversion\",\"type\":\"STRING\"},{\"mode\":\"NULLABLE\",\"name\":\"profileDate\",\"type\":\"INTEGER\"},{\"mode\":\"NULLABLE\",\"name\":\"sample_id\",\"type\":\"INTEGER\"},{\"fields\":[{\"mode\":\"REQUIRED\",\"name\":\"key\",\"type\":\"STRING\"},{\"mode\":\"REQUIRED\",\"name\":\"value\",\"type\":\"INTEGER\"}],\"mode\":\"REPEATED\",\"name\":\"searches\",\"type\":\"RECORD\"},{\"mode\":\"REQUIRED\",\"name\":\"seq\",\"type\":\"INTEGER\"},{\"mode\":\"NULLABLE\",\"name\":\"sessions\",\"type\":\"INTEGER\"},{\"mode\":\"NULLABLE\",\"name\":\"showTrackerStatsShare\",\"type\":\"BOOLEAN\"},{\"mode\":\"REQUIRED\",\"name\":\"submission_timestamp\",\"type\":\"TIMESTAMP\"},{\"mode\":\"NULLABLE\",\"name\":\"tz\",\"type\":\"INTEGER\"},{\"mode\":\"REQUIRED\",\"name\":\"v\",\"type\":\"INTEGER\"}]"

Our version: "[{\"mode\":\"REPEATED\",\"name\":\"accessibilityServices\",\"type\":\"STRING\"},{\"mode\":\"NULLABLE\",\"name\":\"additional_properties\",\"type\":\"STRING\"},{\"mode\":\"REQUIRED\",\"name\":\"arch\",\"type\":\"STRING\"},{\"mode\":\"NULLABLE\",\"name\":\"bug_1501329_affected\",\"type\":\"BOOL\"},{\"mode\":\"NULLABLE\",\"name\":\"campaign\",\"type\":\"STRING\"},{\"mode\":\"REQUIRED\",\"name\":\"clientId\",\"type\":\"STRING\"},{\"mode\":\"NULLABLE\",\"name\":\"created\",\"type\":\"STRING\"},{\"mode\":\"NULLABLE\",\"name\":\"defaultBrowser\",\"type\":\"BOOL\"},{\"mode\":\"NULLABLE\",\"name\":\"defaultSearch\",\"type\":\"STRING\"},{\"mode\":\"REQUIRED\",\"name\":\"device\",\"type\":\"STRING\"},{\"mode\":\"NULLABLE\",\"name\":\"displayVersion\",\"type\":\"STRING\"},{\"mode\":\"NULLABLE\",\"name\":\"distributionId\",\"type\":\"STRING\"},{\"mode\":\"NULLABLE\",\"name\":\"document_id\",\"type\":\"STRING\"},{\"mode\":\"NULLABLE\",\"name\":\"durations\",\"type\":\"INT64\"},{\"mode\":\"REPEATED\",\"name\":\"experiments\",\"type\":\"STRING\"},{\"mode\":\"NULLABLE\",\"name\":\"flashUsage\",\"type\":\"INT64\"},{\"mode\":\"REQUIRED\",\"name\":\"locale\",\"type\":\"STRING\"},{\"fields\":[{\"fields\":[{\"mode\":\"NULLABLE\",\"name\":\"city\",\"type\":\"STRING\"},{\"mode\":\"NULLABLE\",\"name\":\"country\",\"type\":\"STRING\"},{\"mode\":\"NULLABLE\",\"name\":\"db_version\",\"type\":\"STRING\"},{\"mode\":\"NULLABLE\",\"name\":\"subdivision1\",\"type\":\"STRING\"},{\"mode\":\"NULLABLE\",\"name\":\"subdivision2\",\"type\":\"STRING\"}],\"mode\":\"NULLABLE\",\"name\":\"geo\",\"type\":\"RECORD\"},{\"fields\":[{\"mode\":\"NULLABLE\",\"name\":\"date\",\"type\":\"STRING\"},{\"mode\":\"NULLABLE\",\"name\":\"dnt\",\"type\":\"STRING\"},{\"mode\":\"NULLABLE\",\"name\":\"x_debug_id\",\"type\":\"STRING\"},{\"mode\":\"NULLABLE\",\"name\":\"x_pingsender_version\",\"type\":\"STRING\"}],\"mode\":\"NULLABLE\",\"name\":\"header\",\"type\":\"RECORD\"},{\"fields\":[{\"mode\":\"NULLABLE\",\"name\":\"app_build_id\",\"type\":\"STRING\"},{\"mode\":\"NULLABLE\",\"name\":\"app_name\",\"type\":\"STRING\"},{\"mode\":\"NULLABLE\",\"name\":\"app_update_channel\",\"type\":\"STRING\"},{\"mode\":\"NULLABLE\",\"name\":\"app_version\",\"type\":\"STRING\"}],\"mode\":\"NULLABLE\",\"name\":\"uri\",\"type\":\"RECORD\"},{\"fields\":[{\"mode\":\"NULLABLE\",\"name\":\"browser\",\"type\":\"STRING\"},{\"mode\":\"NULLABLE\",\"name\":\"os\",\"type\":\"STRING\"},{\"mode\":\"NULLABLE\",\"name\":\"version\",\"type\":\"STRING\"}],\"mode\":\"NULLABLE\",\"name\":\"user_agent\",\"type\":\"RECORD\"}],\"mode\":\"REQUIRED\",\"name\":\"metadata\",\"type\":\"RECORD\"},{\"mode\":\"NULLABLE\",\"name\":\"normalized_app_name\",\"type\":\"STRING\"},{\"mode\":\"NULLABLE\",\"name\":\"normalized_channel\",\"type\":\"STRING\"},{\"mode\":\"NULLABLE\",\"name\":\"normalized_country_code\",\"type\":\"STRING\"},{\"mode\":\"NULLABLE\",\"name\":\"normalized_os\",\"type\":\"STRING\"},{\"mode\":\"NULLABLE\",\"name\":\"normalized_os_version\",\"type\":\"STRING\"},{\"mode\":\"REQUIRED\",\"name\":\"os\",\"type\":\"STRING\"},{\"mode\":\"REQUIRED\",\"name\":\"osversion\",\"type\":\"STRING\"},{\"mode\":\"NULLABLE\",\"name\":\"profileDate\",\"type\":\"INT64\"},{\"mode\":\"NULLABLE\",\"name\":\"sample_id\",\"type\":\"INT64\"},{\"fields\":[{\"mode\":\"REQUIRED\",\"name\":\"key\",\"type\":\"STRING\"},{\"mode\":\"REQUIRED\",\"name\":\"value\",\"type\":\"INT64\"}],\"mode\":\"REPEATED\",\"name\":\"searches\",\"type\":\"RECORD\"},{\"mode\":\"REQUIRED\",\"name\":\"seq\",\"type\":\"INT64\"},{\"mode\":\"NULLABLE\",\"name\":\"sessions\",\"type\":\"INT64\"},{\"mode\":\"NULLABLE\",\"name\":\"showTrackerStatsShare\",\"type\":\"BOOL\"},{\"mode\":\"REQUIRED\",\"name\":\"submission_timestamp\",\"type\":\"TIMESTAMP\"},{\"mode\":\"NULLABLE\",\"name\":\"tz\",\"type\":\"INT64\"},{\"mode\":\"REQUIRED\",\"name\":\"v\",\"type\":\"INT64\"}]"

The differences appear to be INTEGER/INT64 and BOOLEAN/BOOL.

acmiyaguchi commented 5 years ago

I believe there is also a difference between FLOAT/FLOAT64 that might cause some differences. I pulled the types from the document on Standard SQL Data Types. However, it seems like the types here are from the Legacy SQL data types.

I think the approach that makes the most sense is to add an --use-legacy flag when generating BigQuery schemas. This would probably entail:

Somehow, the options from the command-line would need to be propagated into this main transformation method. Unfortunately, the into trait is fixed. This would be the largest code change, but also a good time to add in error-handling (#34) and to support other options (#55).

whd commented 5 years ago

@acmiyaguchi found that you can just disable legacy sql types in API output, which I have done via https://www.terraform.io/docs/providers/google/r/bigquery_table.html#use_legacy_sql, so there's no need to do anything here.

whd commented 5 years ago

We should keep this around, as our tooling (terraform) doesn't support diffing via the canonical types (except with views for whatever reason). We can fix that either by updating terraform or here, but neither is high priority.