Closed pallix closed 6 years ago
Hi, could you describe a bit what you want to do, why do you want to post-process it?
Thanks for your quick answer.
I am working on updating exprotobuf to support protobuf custom options.
As you can see in this commit it works when explicitly importing the descriptor.proto file.
This is necessary to do so since import
is not supported any more by exprotobuf. However I would like to speed up the solution and not reparse every time the descriptor.proto but reuse the already existing definitions provided by gpb
. The idea would be to pass the already provided definitions of descriptor.proto together with the definitions of the user's proto file to the post processing.
The example provided in my first comment is an example of what would be necessary to achieve this.
What do you think?
Hmm, that's a usecase that is not really supported: The gpb_parse:post_process_ one_file
which is called internally for each file, returns a structure that needs further processing to be complete. The gpb_parse:post_process_all_files
takes this input, resolves references and finalizes the structure. This finalized structure goes into the generated code (available via the generated get_msg_defs()
function). Feeding this finalized structure once again into gpb_parse:post_process_one_file
is bound to fail, since it expectes the internal format.
I understand your concern about speeding up, using already-parsed stuff, but in gpb, there is currently no such possibility. I think the closest you can get is to append together the output from two generated get_msg_defs()
, then feed this into gpb_compile:proto_defs/2,3
, but I'm unsure it there might be any catches.
Have you measured how much time would you save not reparsing descriptor.proto?
I did not measure the parsing time. I was also concerned about making it easier for the user and doing the minimal number of changes to exprotobuf, that's why I had the idea of reusing the gpb parsed protobuf.
I am not sure how I could combine two generated defs, since post processing the first one without the descriptor defs gives this error:
[extend_ref_to_undefined_msg: [:., :google, :., :protobuf, :., :FieldOptions]]
Hi again, when I re-read your answers, there are several aspects that come to mind. Don't know if you've already progressed some other way, and this is no longer important to you(?) or if this is still an issue?
However I would like to speed up the solution and not reparse every time the descriptor.proto but reuse the already existing definitions provided by gpb.
Could you use the generated :gpb_descriptor.get_msg_defs()
function? (guessing Elixir syntax based on your examples, I'm rather rusty on Elixir, unfortunately)
This is necessary to do so since
import
is not supported any more by exprotobuf.
Could you describe a bit the parsing process of exprotobuf, especially also what happens when import
is seen, and when there are references to other would-be-imported messages or enums? Does it use gpb in some way or another for doing parsing?
The gpb
provides some plumbing tools that might perhaps be of use:
:gpb_compile.file("xyz.proto", [..., to_proto_defs]) -> {ok, Defs}
However, this will also read and process imported files.:gpb_compile.proto_defs/2,3
To comile the output from the function above. I think in theoryit might perhaps be possible maybe combine different Defs
, but I'm not sure. What I mean is something along (using Erlang syntax this time)
{ok, Defs1} = gpb_compile:file("a.proto", [..., to_proto_defs]),
{ok, Defs2} = gpb_compile:file("b.proto", [..., to_proto_defs]),
gpb_compile:proto_defs(name_of_combined_module, Defs1++Defs2, [...])
or possibly if Defs1
is would instead be the return value from :gpb_descriptor.get_msg_defs()
. But I've never done this, so I'm unsure if this works in practice. Probably there are some gotchas.
In any case, the :gpb_parse:post_process_all_files/2
was never meant to operate on the output from :gpb_compile.file(..., [to_proto_defs, ...])
nor on the output from :gpb_descriptor.get_msg_defs()
. I see the post_process_all_files
as a gpb-internal function only, part of the parsing process, it is only one stage of parsing.
Thanks for your detailed answer. For the moment I have explicitly loaded the descriptor file each time.
Exprotobuf reads a .proto
content from the disk or from a string, then passes the content of the file (or string) to the gpb function :gpb_parse.post_process_one_file
and/or :gpb_parse.post_process_all_files
. The parsing is done with :gpb_scan.string
and :gpb_parse.parse
.
The result definitions are used to generate Elixir structure with the corresponding keys and gpb defs. These structures are then used for the data modele and the defs for encoding / decoding.
:gpb_compile.file
does not seem appropriate since the .proto content can be define inline.
What is the output type of gpb_compile:proto_defs
?
There is :gpb_compile.file/1,2
, :gpb_compile.string/2,3
and gpb_compile.proto_defs/2,3
for reading from a file, from a string or from proto defs, respectively.
The output depends on options. For each of the functions above, with to_proto_defs
the return value will include the proto defs, with binary
, they will return a binary that can be used with code:load_binary/3
. Without these options, the default is to generate files.
(With "proto defs", I mean the same format as from GeneratedCode:get_msg_defs()
, ie not the same format as returned from post_process_one_file
)
In case one uses eg the :gpb_compile.string
and there are imports, then there are a few utility functions for managing imports, see the :gpb_compile.locate_import/2
, the :gpb_compile.read_import/2
and the import_fetcher
option with its import_fetcher_fun
. Combining these together, it should be possible to parse protobuf files from strings, from databases, from web pages or whatever, even when there are import
s.
Parsing the descriptor.proto
to proto defs is just 5-7 milliseconds on my machine. (For reference, it takes 0.4s to generate a descriptor.erl
file, and about another full second for the erlang compiler to compile that file)
Are you ok with these answers?
Yes it's very useful. I don't have time to work on this issue yet, so I am just explictely loading the descriptor.proto file for the moment.
But if/when I have more time to work on this, your detailed answer will be of great help. Thank you.
How to go ahead with this issue? Close it for now, and if you will have any further questions when you take up work on it again, you can just open a new issue or reopen this, does that sound reasonable?
Yes, thank you for the help!
You're welcome!
Messages definitions returned by
:gpb_descriptor.get_msg_defs
[1] cannot be processed by gpb_parse:Am I doing something wrong or is it a bug?
[1] I am using Elixir syntax