Add export to .json option

I made these libs:

rawproto - can generate SDL (proto file) and basic JSON from raw proto
rawprotoparse - can generate JSON
protoquery - lets you query for specific fields and ask for type, and generates a "tree" which is a broken-down JSON representation of the structure.

One issue with raw proto-to-json, in general, is that every wire-type can actually be several different scalar-types, so it's impossible to really guess the format, reliably. There are a few assumptions you can make to roughly get the value, but it will almost always be off on some fields:

all numbers are unsigned ints, not signed, and not floats. There is no way to know for fixed-types (they could be floats, or int/uint) buttype:0 (VARINT) are integers. uint for all will get most cases, and you can convert to other types, if you know it's not that.
try to parse type:2 messages (LEN or "bytes") as sub-messages first, then as a string (if no characters are outside of range that could be text) and then fallback to bytes. This can give false-positives, like I have found some string-fields that can parse as a sub-message (incorrectly) so it's important to look at the output carefully, and have some way to set which it is for that field (which you can do with a custom handler in my rawprotoparse & protoquery uses type in query, so it will also work.) Also "packed repeat" fields encode to LEN, and are parsed differently than sub-messages, with no real way to detect them. They are easy to parse, if you know that, but impossible to detect automatically.

Part of what is great about this tool is that it just shows the possible types, so you can guess by looking at it, but there is no way to automate that.

Another issue is that JSON doesn't directly map to protobuf. For example, you can have a repeated field at the same level, which can't really be described as an associative array (object) in JSON. like this:

10:
  1: 100
  2: https://example.org/cat.png
10:
  1: 200
  2: https://example.org//dog.png

It cannot be described like this in JSON (invalid):

{
  "10": {
    "1": 100,
    "2": "https://example.org/cat.png"
  },
  "10": {
    "1": 200,
    "2": "https://example.org/dog.png"
  }
}

so you have to do things like make it an array:

{
  "10": [
    {
      "1": 100,
      "2": "https://example.org/cat.png"
    },
    {
      "1": 200,
      "2": "https://example.org/dog.png"
    }
  ]
}

or some other schema (like a flat array with objects that have field-number in it.) I get around this in my libs, by trying to guess and using arrays if multiples are found, or just always use arrays (so they can be traversed in the same way every time.)

I think it's very doable to get "json from raw protobuf" in at least a format that is readable, but these issues make a 1-to-1 translation impossible without the schema, at least automatically.

You can use my tools to generate (an approximation) in JSON (using rawproto or rawprotoparse), or to protoquery individual fields (where you have to provide the type you expect) or use a custom traversal function (in rawprotoparse.) protoquery also has something I call a "tree" which is a root-message that can be traversed, like an AST, but it's less like a plain JSON object, and more of a description of the bytes all broken-down.

I am currently working on protoquery, and I think I will end up merging these projects into 1 thing (so you can query, or try to get "best guess" JSON, or generate a guess at the proto SDL.)

pawitp / protobuf-decoder

Add export to .json option #81