The gpb is a compiler for Google protocol buffer definitions files for Erlang.
Shortcuts: API documentation ~ gpb on hex.pm
Let's say we have a protobuf file, x.proto
message Person {
required string name = 1;
required int32 id = 2;
optional string email = 3;
}
We can generate code for this definition in a number of different ways. Here we use the command line tool. For info on integration with rebar, see further down.
# .../gpb/bin/protoc-erl -I. x.proto
Now we've got x.erl
and x.hrl
. First we compile it and then we can
try it out in the Erlang shell:
# erlc -I.../gpb/include x.erl
# erl
Erlang/OTP 19 [erts-8.0.3] [source] [64-bit] [smp:12:12] [async-threads:10] [kernel-poll:false]
Eshell V8.0.3 (abort with ^G)
1> rr("x.hrl").
['Person']
2> x:encode_msg(#'Person'{name="abc def", id=345, email="a@example.com"}).
<<10,7,97,98,99,32,100,101,102,16,217,2,26,13,97,64,101,
120,97,109,112,108,101,46,99,111,109>>
3> Bin = v(-1).
<<10,7,97,98,99,32,100,101,102,16,217,2,26,13,97,64,101,
120,97,109,112,108,101,46,99,111,109>>
4> x:decode_msg(Bin, 'Person').
#'Person'{name = "abc def",id = 345,email = "a@example.com"}
In the Erlang shell, the rr("x.hrl")
reads record definitions, and
the v(-1)
references a value one step earlier in the history.
Protobuf type | Erlang type |
---|---|
double, float | float() | infinity | '-infinity' | nan When encoding, integers, too, are accepted |
int32, int64 uint32, uint64 sint32, sint64 fixed32, fixed64 sfixed32, sfixed64 |
integer() |
bool | true | false When encoding, the integers 1 and 0, too, are accepted |
enum | atom() unknown enums decode to integer() |
message | record (thus tuple()) or map() if the maps (-maps) option is specified |
string | unicode string, thus list of integers or binary() if the strings_as_binaries (-strbin) option is specified When encoding, iolists, too, are accepted |
bytes | binary() When encoding, iolists, too, are accepted |
oneof | {ChosenFieldName, Value} or ChosenFieldName => Value if the {maps_oneof,flat} (-maps_oneof flat) option is specified |
map<_,_> | An unordered list of 2-tuples, [{Key,Value}] or a map(), if the maps (-maps) option is specified |
Repeated fields are represented as lists.
Optional fields are represented as either the value or undefined
if
not set. However, for maps, if the option maps_unset_optional
is set
to omitted
, then unset optional values are omitted from the map,
instead of being set to undefined
when encoding messages. When
decoding messages, even with maps_unset_optional
set to omitted
,
the default value will be set in the decoded map.
message m1 {
repeated uint32 i = 1;
required bool b = 2;
required eee e = 3;
required submsg sub = 4;
}
message submsg {
required string s = 1;
required bytes b = 2;
}
enum eee {
INACTIVE = 0;
ACTIVE = 1;
}
#m1{i = [17, 4711],
b = true,
e = 'ACTIVE',
sub = #submsg{s = "abc",
b = <<0,1,2,3,255>>}}
%% If compiled to with the option maps:
#{i => [17, 4711],
b => true,
e => 'ACTIVE',
sub => #{s => "abc",
b => <<0,1,2,3,255>>}}
message m2 {
optional uint32 i1 = 1;
optional uint32 i2 = 2;
}
#m2{i1 = 17} % i2 is implicitly set to undefined
%% With the maps option
#{i1 => 17}
%% With the maps option and the maps_unset_optional set to present_undefined:
#{i1 => 17,
i2 => undefined}
This construct first appeared in Google protobuf version 2.6.0.
message m3 {
oneof u {
int32 a = 1;
string b = 2;
}
}
A oneof field is automatically always optional.
#m3{u = {a, 17}}
#m3{u = {b, "hello"}}
#m3{} % u is implicitly set to undefined
%% With the maps option
#{u => {a, 17}}
#{u => {b, "hello"}}
#{} % If maps_unset_optional = omitted (default)
#{u => undefined} % With maps_unset_optional set to present_undefined
%% With the {maps_oneof,flat} option (requires maps_unset_optional = omitted)
#{a => 17}
#{b => "hello"}
#{}
Not to be confused with Erlang maps.
This construct first appeared in Google protobuf version 3.0.0 (for
both the proto2
and the proto3
syntax)
message m4 {
map<uint32,string> f = 1;
}
For records, the order of items is undefined when decoding.
#m4{f = []}
#m4{f = [{1, "a"}, {2, "b"}, {13, "hello"}]}
%% With the maps option
#{f => #{}}
#{f => #{1 => "a", 2 => "b", 13 => "hello"}}
default
optionThis describes how decoding works for optional fields that are not present in the binary-to-decode.
The documentation for Google protobuf says these decode to the default
value if specified, or else to the field's type-specific default. The
code generated by Google's protobuf compiler also contains
has_<field>()
methods so one can examine whether a field was
actually present or not.
However, in Erlang, the natural way to set and read fields is to just use the syntax for records (or maps), and this leaves no good way to at the same time both convey whether a field was present or not and to read the defaults.
So the approach in gpb
is that you have to choose: either or.
Normally, it is possible to see whether an optional field is
present or not, eg by checking if the value is undefined
. But there
are options to the compiler to instead decode to defaults, in which
case you lose the ability to see whether a field is present or not.
The options are defaults_for_omitted_optionals
and
type_defaults_for_omitted_optionals
, for decoding to default=<x>
values, or to type-specific defaults respectively.
It works this way:
message o1 {
optional uint32 a = 1 [default=33];
optional uint32 b = 2; // the type-specific default is 0
}
Given binary data <<>>
, that is, neither field a
nor b
is present,
then the call decode_msg(Input, o1)
results in:
#o1{a=undefined, b=undefined} % None of the options
#o1{a=33, b=undefined} % with option defaults_for_omitted_optionals
#o1{a=33, b=0} % with both defaults_for_omitted_optionals
% and type_defaults_for_omitted_optionals
#o1{a=0, b=0} % with only type_defaults_for_omitted_optionals
The last of the alternatives is perhaps not very useful, but still possible, and implemented for completeness.
For proto3, there is neither required
nor default=<x>
for fields. Instead, unless marked with optional
, all scalar fields,
strings and bytes are implicitly optional. On decoding, if such a field
is missing in the binary to decode, they always decode to the type-specific
default value.
On encoding, such fields are only included in the resulting encoded
binary if they have a value different from the type-specific default
value. Even though all fields are implicitly optional, one could also
say that on a conceptual level, all such fields always have a value.
At decoding, it is not possible to determine whether at encoding,
a value was present---with a type-specific value---or not.
Fields marked as optional
are essentially represented the same way
as in proto2 syntax; in a record the field has the value undefined
if it is not set, and in maps the field is not present if it is not set.
A recommendation I've seen for if you need detection of "missing" data,
is to define has_<field>
boolean fields and set them appropriately.
Another alternative could be to use the well-known wrapper messages.
Fields that are sub-messages and oneof fields, do not have any type-specific default. A sub-message field that was not set encodes differently from a sub-message field set to the sub-message, and it decodes differently. This holds even when the sub-message has no fields. It works a bit similarly for oneof fields. Either none of the alternative oneof fields is set, or one of them is. The encoded format is different, and on decoding it is possible to tell a difference.
Parses protocol buffer definition files and can generate:
Features of the protocol buffer definition files: gpb supports:
packed
and default
options for fieldsallow_alias
enum option (treated as if it is always set true)oneof
(introduced in protobuf 2.6.0)map<_,_>
(introduced in protobuf 3.0.0)gpb reads but ignores:
packed
or default
gpb does not support:
Characteristics of gpb:
bytes
fields, in order to let the runtime system free the larger message
binary.package
attribute by prepending
the name of the package to every contained message type (if defined),
which is useful to avoid name clashes of message types across packages.
See the use_packages
option
or the -pkgs
command line option.#field{}
record in gpb.hrl for the get_msg_defs
function, but it is possible to avoid this dependency by using
the also the defs_as_proplists
or -pldefs
option.Introspection
gpb generates some functions for examining messages, enums and services:
get_msg_defs()
(or get_proto_defs()
if introspect_get_proto_defs
is set), get_msg_names()
, get_enum_names()
find_msg_def(MsgName)
and fetch_msg_def(MsgName)
find_enum_def(MsgName)
and fetch_enum_def(MsgName)
enum_symbol_by_value(EnumName, Value)
,enum_symbol_by_value_<EnumName>(Value)
,
enum_value_by_symbol(EnumName, Enum)
and
enum_value_by_symbol_<EnumName>(Enum)
get_service_names()
, get_service_def(ServiceName)
, get_rpc_names(ServiceName)
find_rpc_def(ServiceName, RpcName)
, fetch_rpc_def(ServiceName, RpcName)
There are also some functions for translating between fully qualified names and internal names. These take any renaming options into consideration. They may be useful for instance with grpc reflection.
fqbin_to_service_name(<<"Package.ServiceName">>)
and service_name_to_fqbin('ServiceName')
fqbins_to_service_and_rpc_name(<<"Package.ServiceName">>, <<"RpcName">>)
and service_and_rpc_name_to_fqbins('ServiceName', 'RpcName')
fqbin_to_msg_name(<<"Package.MsgName">>)
and
msg_name_to_fqbin('MsgName')
fqbin_to_enum_name(<<"Package.EnumName">>)
and
enum_name_to_fqbin('EnumName')
There are also some functions for querying what proto a type belongs
to. Each type belongs to some "name"
which is a string, usually the
file name, sans extension, for example "name"
if the proto file was
"name.proto"
.
get_all_proto_names() -> ["name1", ...]
get_msg_containment("name") -> ['MsgName1', ...]
get_pkg_containment("name") -> 'Package'
get_service_containment("name") -> ['Service1', ...]
get_rpc_containment("name") -> [{'Service1', 'RpcName1}, ...]
get_proto_by_msg_name_as_fqbin(<<"Package.MsgName">>) -> "name"
get_proto_by_enum_name_as_fqbin(<<"Package.EnumName">>) -> "name"
get_protos_by_pkg_name_as_fqbin(<<"Package">>) -> ["name1", ...]
There are also some version information functions:
gpb:version_as_string()
, gpb:version_as_list()
and gpb:version_source()
GeneratedCode:version_as_string()
, GeneratedCode:version_as_list()
andGeneratedCode:version_source()
?gpb_version
(in gpb_version.hrl)?'GeneratedCode_gpb_version'
(in GeneratedCode.hrl)The gpb can also generate a self-description of the proto file. The self-description is a description of the proto file, encoded to a binary using the descriptor.proto that comes with the Google protocol buffers library. Note that such an encoded self-descriptions won't be byte-by-byte identical to what the Google protocol buffers compiler will generate for the same proto, but should be roughly equivalent.
Erroneously encoded protobuf messages and fields will generally cause the decoder to crash. Examples of such erroneous encodings are:
Maps
Gpb can generate encoders/decoders for maps.
The option maps_unset_optional
can be used to specify behavior
for non-present optional fields: whether they are omitted from
maps, or whether they are present, but have the value undefined
like for records.
Reporting of errors in .proto files
Gpb is not very good at error reporting, especially referencing
errors, such as references to messages that are not defined.
You might want to first verify with protoc
that the .proto files
are valid before feeding them to gpb.
For info on how to use gpb with rebar3, see https://rebar3.org/docs/configuration/plugins/#protocol-buffers
In rebar there is support for gpb since version 2.6.0. See the proto compiler section of rebar.sample.config file at https://github.com/rebar/rebar/blob/master/rebar.config.sample
For older versions of rebar---prior to 2.6.0---the text below outlines how to proceed:
Place the .proto files for instance in a proto/
subdirectory.
Any subdirectory, other than src/, is fine, since rebar will try to
use another protobuf compiler for any .proto it finds in the src/
subdirectory. Here are some some lines for the rebar.config
file:
%% -*- erlang -*-
{pre_hooks,
[{compile, "mkdir -p include"}, %% ensure the include dir exists
{compile,
"/path/to/gpb/bin/protoc-erl -I`pwd`/proto"
"-o-erl src -o-hrl include `pwd`/proto/*.proto"
}]}.
{post_hooks,
[{clean,
"bash -c 'for f in proto/*.proto; "
"do "
" rm -f src/$(basename $f .proto).erl; "
" rm -f include/$(basename $f .proto).hrl; "
"done'"}
]}.
{erl_opts, [{i, "/path/to/gpb/include"}]}.
The gpb version number is fetched from the git latest git tag matching N.M where N and M are integers. This version is inserted into the gpb.app file as well as into the include/gpb_version.hrl. The version is the result of the command
git describe --always --tags --match '[0-9]*.[0-9]*'
Thus, to create a new version of gpb, the single source from where this version is fetched, is the git tag. (If you are importing gpb into another version control system than git, or using another build tool than rebar, you might have to adapt rebar.config and src/gpb.app.src accordingly. See also the section below about building outside of a git work tree for info on exporting gpb from git.)
The version number from the git describe
command above will look like
<x>.<y>.<z>
(on master on Github)<x>.<y>.<z>-<n>-g<sha>
(on branches or between releases)The version number on the master branch of the gpb on Github is
intended to always be only integers with dots, in order to be
compatible with reltool. In other words, each push to Github's master
branch is considered a release, and the version number is bumped.
To ensure this, there is a pre-push
git hook and two scripts,
install-git-hooks
and tag-next-minor-vsn
, in the helpers
subdirectory. The ChangeLog file will not necessarily reflect all
minor version bumps, only important updates.
Places to update when making a new version:
The gpb build process expects a (non-shallow) git work tree, with tags, to get the version numbering right, as described in the Version numbering section, but it is also possible to build outside of git. To do that, you have two options:
gpb.vsn
,
with the version on the first linehelpers/mk-versioned-archive
script,
then unpack the archive and build inside it.If you create the versioned archive in a git work tree, the version
will be set automatically, otherwise you will need to specify it
manually. Run mk-versioned-archive --help
for info on what options
to use.
When downloading from Github, the gpb-<x.y.z>.tar.gz archives have been created using the mk-versioned-archive script, so it is possible to just unpack and build directly.
If you use Github's automatic Source code zip or tar.gz archives,
you will need to either create a gpb.vsn
file as described above,
or re-create a versioned archive using the mk-versioned-archive
script and the --override-version=<x>
option (or possibly the
or the --override-version-from-cwd-path
option if the directory name
contains a proper version.)
Contributions are welcome, preferably as pull requests or git patches or git fetch requests. Here are some guide lines:
rebar clean; rebar eunit && rebar doc
See the ChangeLog for details.
The default value for the maps_unset_optional
option has changed
to omitted
, from present_undefined
This concerns only code generated
with the maps (-maps) options. Projects that already set this option
explicitly are not impacted. Projects that relied on the default to be
present_undefined
will need to set the option explicitly in order to
upgrade to 4.0.0.
For type specs, the default has changed to generate them when possible. The
option {type_specs,false}
(-no_type) can be used to avoid generating type
specs.