konsumer / rawproto

Guess structure of protobuf binary from raw data
https://konsumer.js.org/rawproto/
37 stars 10 forks source link

Add toBinary method to encode decoded-rawProto object to binary #25

Closed labac-p closed 4 weeks ago

labac-p commented 1 month ago

🙇🏻

konsumer commented 1 month ago

This should be in proto.buffer, right?

const tree = new RawProto([0x1a, 0x03, 0x08, 0x96, 0x01])
console.log(tree.buffer)
// ArrayBuffer { [Uint8Contents]: <1a 03 08 96 01>, byteLength: 5 }

It should just match what you inputed to it, anyway. It doesn't really decode how it might seem (the entire message at once) it's more like pointers to different parts of a message, at each level, that get queried at call-time, if that makes sense. Each message has it's own buffer at each level, and some helpers for pulling off values and stuff.

If what you want is "JSON to protobuf binary" I also think that would be cool, but it's not implemented here, possibly out of scope for this library, and might be more tricky than it seems, since different types don't cleanly map to JSON and vice-versa, you will never get a "parse this to JSON, then back to binary" that 100% matches unknown proto SDL (Schema Definition Language.) We might be able to get that with type-hints, but then you are making something a bit more complicated than just generating the proto, and encoding your own messages with that. Since you can tweak the generated proto (name fields, pick specific types when it's unclear, etc) you can just use that (with any probuf js lib) to parse binary to JSON, edit something, then encode back to binary.

I talk about this a bit here, but as I said, the typemap-JSON thing ends up as complicated as just using a proto (that you can hand-tune from toProto output) and more mature regular protobuf-tooling will work much better with that stuff. You can actually do a sort of hybrid workflow, if you really don't know structure, but have ideas for naming things, where you output toProto, and from there, you can hand-edit, and then you have the proto to use regular tools (that you can tune how you want, and use to encode/decode binary messages.)

konsumer commented 4 weeks ago

Ok, I added something just for you, since it seems like a nice use-case anyway, and it gave me a chance to test out the toProto function, which definitely needed more testing.

You'll need to install rawproto@1.0.2 (latest) and then check out this test. Basically, I have an unknown binary, I analyze it with rawproto, create a proto SDL from that, then use protobufjs with the SDL to encode it. In an actual usecase, you'd want to edit the generated proto SDL to look more like this, which has better field-names/types and known enums (instead of just setting them all to int32, because it's unclear from just the raw-parsing.)

So now toProto works a bit better, and you can see how to generate proto from bin, then use that to mess with binary messages (encode, or even decode.) Let me know if that helps.

One thing to keep in mind about the generated proto, is I don't set the version or anything in toProto, so when I write it, I do this:

await writeFile('/tmp/tester.proto', 'syntax = "proto3";\n' + protoSDL)

which tells protobuffjs that everything is optional, and a few other lil hints about the bin-data. The outputted format is proto3, anyway, since in proto2 every field needs optional and I leave that out. if you need proto2 you can go and add decorators like optional to the generated proto.

labac-p commented 4 weeks ago

Thank you for your thoughtful reply. With your suggestion I have found the solution to my problem.