tomas-abrahamsson / gpb

A Google Protobuf implementation for Erlang
Other
557 stars 153 forks source link

Messages defs from gpb_descriptor cannot be processed by gpb_parse #141

Closed pallix closed 6 years ago

pallix commented 6 years ago

Messages definitions returned by :gpb_descriptor.get_msg_defs [1] cannot be processed by gpb_parse:

iex(2)> process = fn f -> :gpb_parse.post_process_one_file("<nofile>", f, []) end
#Function<6.99386804/1 in :erl_eval.expr/5>
iex(3)> {:ok, processed} = :gpb_descriptor.get_msg_defs() |> process.()         
iex(5)> :gpb_parse.post_process_all_files(processed, [])
** (FunctionClauseError) no function clause matching in :gpb_parse."-reformat_name/1-lc$^0/1-0-"/1    

    The following arguments were given to :gpb_parse."-reformat_name/1-lc$^0/1-0-"/1:

        # 1
        :"SourceCodeInfo.Location"

    (gpb) src/gpb_parse.yrl:1137: :gpb_parse."-reformat_name/1-lc$^0/1-0-"/1
    (gpb) src/gpb_parse.yrl:1137: :gpb_parse.reformat_name/1
    (gpb) src/gpb_parse.yrl:1116: anonymous fn/1 in :gpb_parse.reformat_fields/1
    (stdlib) lists.erl:1239: :lists.map/2
    (gpb) src/gpb_parse.yrl:1075: anonymous fn/1 in :gpb_parse.reformat_names/1
    (stdlib) lists.erl:1239: :lists.map/2
    (stdlib) lists.erl:1239: :lists.map/2
    (gpb) src/gpb_parse.yrl:456: :gpb_parse.post_process_all_files/2

Am I doing something wrong or is it a bug?

[1] I am using Elixir syntax

tomas-abrahamsson commented 6 years ago

Hi, could you describe a bit what you want to do, why do you want to post-process it?

pallix commented 6 years ago

Thanks for your quick answer.

I am working on updating exprotobuf to support protobuf custom options.

As you can see in this commit it works when explicitly importing the descriptor.proto file.

This is necessary to do so since import is not supported any more by exprotobuf. However I would like to speed up the solution and not reparse every time the descriptor.proto but reuse the already existing definitions provided by gpb. The idea would be to pass the already provided definitions of descriptor.proto together with the definitions of the user's proto file to the post processing.

The example provided in my first comment is an example of what would be necessary to achieve this.

What do you think?

tomas-abrahamsson commented 6 years ago

Hmm, that's a usecase that is not really supported: The gpb_parse:post_process_ one_file which is called internally for each file, returns a structure that needs further processing to be complete. The gpb_parse:post_process_all_files takes this input, resolves references and finalizes the structure. This finalized structure goes into the generated code (available via the generated get_msg_defs() function). Feeding this finalized structure once again into gpb_parse:post_process_one_file is bound to fail, since it expectes the internal format.

I understand your concern about speeding up, using already-parsed stuff, but in gpb, there is currently no such possibility. I think the closest you can get is to append together the output from two generated get_msg_defs(), then feed this into gpb_compile:proto_defs/2,3, but I'm unsure it there might be any catches.

Have you measured how much time would you save not reparsing descriptor.proto?

pallix commented 6 years ago

I did not measure the parsing time. I was also concerned about making it easier for the user and doing the minimal number of changes to exprotobuf, that's why I had the idea of reusing the gpb parsed protobuf.

I am not sure how I could combine two generated defs, since post processing the first one without the descriptor defs gives this error:

 [extend_ref_to_undefined_msg: [:., :google, :., :protobuf, :., :FieldOptions]]
tomas-abrahamsson commented 6 years ago

Hi again, when I re-read your answers, there are several aspects that come to mind. Don't know if you've already progressed some other way, and this is no longer important to you(?) or if this is still an issue?

However I would like to speed up the solution and not reparse every time the descriptor.proto but reuse the already existing definitions provided by gpb.

Could you use the generated :gpb_descriptor.get_msg_defs() function? (guessing Elixir syntax based on your examples, I'm rather rusty on Elixir, unfortunately)

This is necessary to do so since import is not supported any more by exprotobuf.

Could you describe a bit the parsing process of exprotobuf, especially also what happens when import is seen, and when there are references to other would-be-imported messages or enums? Does it use gpb in some way or another for doing parsing?

The gpb provides some plumbing tools that might perhaps be of use:

In any case, the :gpb_parse:post_process_all_files/2 was never meant to operate on the output from :gpb_compile.file(..., [to_proto_defs, ...]) nor on the output from :gpb_descriptor.get_msg_defs(). I see the post_process_all_files as a gpb-internal function only, part of the parsing process, it is only one stage of parsing.

pallix commented 6 years ago

Thanks for your detailed answer. For the moment I have explicitly loaded the descriptor file each time.

Exprotobuf reads a .proto content from the disk or from a string, then passes the content of the file (or string) to the gpb function :gpb_parse.post_process_one_file and/or :gpb_parse.post_process_all_files. The parsing is done with :gpb_scan.string and :gpb_parse.parse.

The result definitions are used to generate Elixir structure with the corresponding keys and gpb defs. These structures are then used for the data modele and the defs for encoding / decoding.

:gpb_compile.file does not seem appropriate since the .proto content can be define inline.

What is the output type of gpb_compile:proto_defs ?

tomas-abrahamsson commented 6 years ago

There is :gpb_compile.file/1,2, :gpb_compile.string/2,3 and gpb_compile.proto_defs/2,3 for reading from a file, from a string or from proto defs, respectively.

The output depends on options. For each of the functions above, with to_proto_defs the return value will include the proto defs, with binary, they will return a binary that can be used with code:load_binary/3. Without these options, the default is to generate files.

(With "proto defs", I mean the same format as from GeneratedCode:get_msg_defs(), ie not the same format as returned from post_process_one_file)

In case one uses eg the :gpb_compile.string and there are imports, then there are a few utility functions for managing imports, see the :gpb_compile.locate_import/2, the :gpb_compile.read_import/2 and the import_fetcher option with its import_fetcher_fun. Combining these together, it should be possible to parse protobuf files from strings, from databases, from web pages or whatever, even when there are imports.

Parsing the descriptor.proto to proto defs is just 5-7 milliseconds on my machine. (For reference, it takes 0.4s to generate a descriptor.erl file, and about another full second for the erlang compiler to compile that file)

tomas-abrahamsson commented 6 years ago

Are you ok with these answers?

pallix commented 6 years ago

Yes it's very useful. I don't have time to work on this issue yet, so I am just explictely loading the descriptor.proto file for the moment.

But if/when I have more time to work on this, your detailed answer will be of great help. Thank you.

tomas-abrahamsson commented 6 years ago

How to go ahead with this issue? Close it for now, and if you will have any further questions when you take up work on it again, you can just open a new issue or reopen this, does that sound reasonable?

pallix commented 6 years ago

Yes, thank you for the help!

tomas-abrahamsson commented 6 years ago

You're welcome!