Apache Avro schema tools for Tarantool, implemented from scratch in Lua.
Notable features:
avro_schema = require('avro_schema')
To install the module use
tarantoolctl rocks install avro-schema
ok, schema = avro_schema.create {
type = "record",
name = "Frob",
fields = {
{ name = "foo", type = "int", default = 42 },
{ name = "bar", type = "string" }
}
}
Creates a schema object (ok == true
). If there was a syntax error, returns false
and the
error message.
ok, normalized_data_copy = avro_schema.validate(schema, { bar = "Hello, world!" })
Returns true
if the data was valid. Otherwise, returns false
and the error message.
The avro_schema.validate()
function creates a normalized copy of the data.
Normalization implies filling in default values for missing fields.
For example, because the "foo" field has a default value = 42,
the result from the above example will be { foo = 42, bar = "Hello, world!" }
.
To facilitate data evolution Avro defines certain schema mapping rules.
If schemas A
and B
are compatible, then one can convert data from A
to B
.
ok = avro_schema.are_compatible(schema1, schema2)
ok = avro_schema.are_compatible(schema2, schema1, "downgrade")
Allowed modifications include:
aliases
are correctly set);int
is compatible with long
but not vice versa).Let's assume:
B
is newer than A
.A
defines Apple
(a record type).B
renames it to Banana
.Upgrading data from A
to B
works, since Banana
is marked as an alias of Apple
.
However, downgrading data from B
to A
does not work, since in A
the record type
Apple
has no aliases.
To make it work we implement downgrade
mode.
In downgrade mode, name mapping rules take into account the aliases in the source schema,
and ignore the aliases in the target schema.
avro_schema.is(object)
avro_schema.get_names(schema [, service-fields])
avro_schema.get_types(schema [, service-fields])
The first argument must be a schema object, such as the one created in the Creating a schema example above.
The optional second argument is a table with names of types, such as {'string', 'int'}
.
The result will be a Lua table of field names (for the get_names
method)
or a Lua table of field types (for the get_types
method).
The order will match the field order in the flat representation.
Compiling a schema creates optimized data conversion routines (runtime code generation).
ok, methods = avro_schema.compile(schema)
ok, methods = avro_schema.compile({schema1, schema2})
If two schemas are provided, then the generated routines consume data in schema1
and
produce results in schema2
.
What if the schema1
source and the schema2
destination are not adjacent revisions,
i.e. there were some revisions in between?
While going from source to destination directly is fast, sometimes it alters the results.
Performing conversion step by step, using all the in-between revisions, always yields
correct results but it is slow.
There is a third option: let compile
generate routines that are fast yet produce the
correct results.
A few options affecting compilation are recognized.
Enabling downgrade
mode (see avro_schema.are_compatible
for details):
ok, methods = avro_schema.compile({schema1, schema2, downgrade = true})
Dumping generated code for inspection:
ok, methods = avro_schema.compile({schema1, schema2, dump_src = "output.lua"})
Troubleshooting code generation issues:
ok, methods = avro_schema.compile({schema1, schema2, debug = true, dump_il = "output.il"})
Add service fields (which are part of a tuple, but are not part of an object):
ok, methods = avro_schema.compile({schema, service_fields = {'string', 'int'}})
Compile
produces the following routines (returned in a Lua table):
flatten
unflatten
xflatten
flatten_msgpack
unflatten_msgpack
xflatten_msgpack
get_types
get_names
Here is an example which uses the avro schema that we described in
the section Creating a schema, a Tarantool database space,
and the methods that compile
produces. This is a script that you
can paste into a client of a Tarantool server; the comments explain
what the results look like and what they mean.
-- Create a Tarantool database, an index, and a tuple
box.schema.space.create('T')
box.space.T:create_index('I')
box.space.T:insert{1, 'string-value'}
-- Let tuple_1 = a tuple from the database space
tuple_1 = box.space.T:get(1)
-- Load the module
avro_schema = require('avro_schema')
-- Load avro_schema and create a schema as described earlier
ok, schema = avro_schema.create {
type = "record",
name = "Frob",
fields = {
{ name = "foo", type = "int", default = 42 },
{ name = "bar", type = "string" }
}
}
-- Compile, so that "methods" will have the generated routines
ok, methods = avro_schema.compile(schema)
-- Invoke unflatten(). The result will look like this:
-- - {'foo': 1, 'bar': 'string-value'}
-- That is: unflattening can turn tuples into avro-schema objects.
ok, result = methods.unflatten(tuple_1)
result
-- Make a new Lua table with an integer and a string component
-- table_1 = {42, 'string-value-2'}
-- Invoke flatten(). The result can be inserted into the database.
-- The value of the newly inserted tuple will look like this:
-- - [1, 'string-value']
-- That is, flattening can turn avro-schema objects into tuples.
ok, tuple_2 = methods.flatten(result)
box.space.T:truncate()
box.space.T:insert(tuple_2)
-- Make an avro_schema object with {foo=2, bar='Hello, World!'}
ok, normalized_data_copy = avro_schema.validate(schema, { bar = "Hello, world!" })
-- Invoke xflatten(). The result will look like this:
-- - [['=', 1, 42], ['=', 2, 'Hello, world!']]
ok, result = methods.xflatten(normalized_data_copy)
result
-- That is, the format of an xflatten() result is exactly
-- what a Tarantool "update" request looks like.
-- Therefore let's put it in an update request ...
box.space.T:update({42},result)
-- And the result looks like:
-- -- - [1, 'Hello, world!']
So: with flatten()
for inserting, xflatten()
for updating,
unflatten()
for getting, we have ways to use avro_schema
objects as tuples in
Tarantool databases.
With the other three methods that work with transformations of
avro_schema
objects -- flatten_msgpack()
and xflatten_msgpack()
and
unflatten_msgpack()
-- we have similar functionality,
except that the transformations are to and from MsgPack objects.
(The ..._msgpack()
methods are usually faster because
they do not need to encode or decode internally.)
The final two methods -- get_types()
and get_names()
-- have almost the
same effect as get_types()
and get_names()
described in the earlier section
Querying a schema's field names or field types.
(The main difference is that the optional "service_fields" argument
is unnecessary if methods
is the result of a compile done with
the service_fields =
option.) For example:
tarantool> methods.get_names()
---
- - foo
- bar
...
tarantool> methods.get_types()
---
- - int
- string
...
Named types are ones that have mandatory name
fields in their definitions:
record, fixed, enum.
Named types can be referenced after the first definition (in depth-first, left-to-right traversal).
Example:
{
name = 'user',
type = 'record',
fields = {
{name = 'uid', type = 'long'},
{
name = 'nested',
type = {
type = 'record',
name = 'nested_record',
fields = {
{name = 'x', type = 'long'},
{name = 'y', type = 'long'}
}
}
},
{
name = 'another_nested',
type = 'nested_record'
}
}
}
Notes:
The problem: in database management systems NULL is a value, not a type. So it should be possible, for example, to have a "long integer" type that can contain both NULL and integers.
One can try to handle this with a union such as {'null', 'long'}
which
can have both null
and {long = 42}
. What really is necessary, though,
is that a single field, whose name determines the type, can contain both
null
and 42
as valid values (see the JSON Encoding
section of the avro-schema standard). This problem -- expressing a single
type that accepts both null
and 42
-- is the problem that the
nullability extension solves.
A type can be marked as nullable by adding an asterisk ("*") at the end of the type name:
{
name = 'user',
type = 'record',
fields = {
{name = 'uid', type = 'long'},
{name = 'first_name', type = 'string'},
{name = 'middle_name', type = 'string*'},
{name = 'last_name', type = 'string'}
}
}
The following types can be marked as nullable:
Notes:
{'null', ...}
without an asterisk to make a union nullable type....
Default values are substituted in two cases:
Notes:
local schema = {
type = "record", name = "Frob", fields = {
{ name = "foo", default = {f1=1, f2={f2_1=2}}, type =
{ type = "record", name = "default_1", fields = {
{name = "f1", type = "int"},
{name = "f2", default = {f2_1=21}, type =
{type = "record", name = "default_2", fields = {
{name = "f2_1", type = "int"}}
}}
}}},
{ name = "bar", type = "int"}
}
}
ok, handle = avro_schema.create(schema)
ok, methods = avro_schema.compile(handle)
ok, unflattened = methods.flatten({bar=11})
-- returns {1,2,11}
ok, unflattened = methods.flatten({foo={f1=3},bar=11})
-- returns {3,21,11}