mozilla / jsonschema-transpiler

Compile JSON Schema into Avro and BigQuery schemas
Mozilla Public License 2.0
42 stars 10 forks source link

Support bytes as a data type #81

Closed acmiyaguchi closed 5 years ago

acmiyaguchi commented 5 years ago

There's some use of the bytes SQL type for storing arbitrary binary data. bytes are generally not well formed in JSON, but supported in Avro and BigQuery. A low-impact solution is to create a custom bytes format under the string type, as follows:

{
  "type": "string",
  "format": "bytes"
}

This schema isn't used to validate the payload because documents containing binary data may also contain control characters that invalidate the JSON documents. Instead, the schema is descriptive and used to generate Avro/BigQuery schemas instead.

See: https://json-schema.org/understanding-json-schema/reference/string.html#format https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#bytes-type

relud commented 5 years ago

This schema isn't used to validate the payload because documents containing binary data may also contain control characters that invalidate the JSON documents

in json it should be encoded as a base64 string.

that said, we don't actually validate anything using this format, for now.