tafia / quick-protobuf

A rust implementation of protobuf parser
MIT License
446 stars 82 forks source link

Proposal for a re-worked API #246

Open thomaseizinger opened 1 year ago

thomaseizinger commented 1 year ago

We are in the process of adopting quick-protobuf in rust-libp2p and a few points came up whilst using the API.

My suggestions would be:

  1. Unify MessageInfo, MessageRead and MessageWrite into a single trait Message.
  2. The standard library already has an abstraction for writing bytes: std::io::Write. Can Message::write perhaps simply take a &mut std::io::Write? This would remove the layer of constructing a Writer with a WriterBackend. A std::fs::File already implements Write and so does std::vec::Vec.
  3. On a related thought, we might want to consider offering an AsyncMessage type that uses the futures abstraction AsyncRead and AsyncWrite.
  4. This might be controversial but I think writing a message with a length-prefix is out of scope for this library. Length-prefixing messages is a protocol for how to parties can interact. I think we are better off with focusing on reading and writing protobufs in this library. I am guessing protobufs themselves are not self-descriptive for this very reason: It is too opinionated to be really generally useful.
  5. Provide only a handful of utility functions at the module root:
  6. We will need some abstractions within the library that the generated-code can use to avoid duplication. I think those should be in a separate module like codegen or internal which clearly signals to users that they shouldn't use these directly.

Let me know what you think. I am open to working on this and sending PRs :)

snproj commented 1 year ago

Ah I was planning to remove the length prefix in serialize_into_vec() too as per my comment in #202. I guess you guys would be ok with that too? I don't think we should necessarily retain the old function with the length prefix in any form either; so far no one seems to have voiced out in favor of that.

thomaseizinger commented 1 year ago

Cool!

Yeah I am okay with it being removed. That isn't the only part of my proposal though so I am keen to hear your opinion on the remaining points :)

snproj commented 1 year ago

Re: Suggestion 1

Click to open
As far as I know, `MessageInfo` is meant to be kinda optional (as in, users can choose to generate `MessageInfo` implementations for autogenerated structs through a cmd line arg), so unifying all three might be a bit messy. So far, I don't see technical issues unifying `MessageRead` and `MessageWrite`, though. Would that make things easier?

Re: Suggestion 2

Click to open
### Rewriting BytesWriter Hmm, I assume the focus here is on improving the user interface (as opposed to making the library codebase smaller). If so, one way I can think of doing it is to rewrite `BytesWriter` to have its `buf` field be `std::io::Write` instead of `&mut [u8]`. `Message::write()` (which I'm assuming is the same as `MessageWrite::write_message()`) would then take in `std::io::Write`, and internally convert it into `BytesWriter` and then `Writer` and proceed as per normal. We might then be able to make `Writer`, `WriterBackend` and `BytesWriter` private. We still need `Writer` and `WriterBackend` to exist since they serve to encapsulate protobuf-specific writing functions (i.e. write a varint, write with a tag, etc). ### So far, I don't think this is ideal However, ultimately I don't think this is something we want to do; I'd have a few arguments in favor of the current system: 1. Although `write_message()` takes in `&mut Writer`, this is easily constructed from any `&mut [u8]` using `Writer::new(BytesWriter::new(buf))`. I feel that being able to use any `&mut [u8]` might be a bit less restrictive than having to use something that implements `std::io::Write`. 2. On that note, I'm not sure what the interface would be for users of `no_std`. 3. Using my proposal above (the rewritten `BytesWriter`) would put the responsibility of optimizing byte writing and error checking on the user's choice implementation of `std::io::Write`; I would want the library to take care of that instead, since speed optimization is a core focus of this library. In the current system, since we tend to use `Writer::new(BytesWriter::new(buf))`, where `buf` is `&mut [u8]`, our library has direct access to the user's byte array through the implementation of `BytesWriter`. 4. Users who wish to use `std::io::Write` should still be able to use the current system by rewriting `BytesWriter` as proposed above. Having said that, I'm open to counterarguments! I'm not very experienced, so do let me know if I'm straight up not making sense haha 😅

Re: Suggestion 3

Click to open
I'm not familiar with `futures`, but I'm not opposed to the idea of this as an addon. I assume that these would be additional structs that wouldn't really affect the existing users, and would provide essentially the same services as the existing ones; just that they would be used in codebases dealing with asynchronous reading and writing. It would be quite a big addition, so I probably won't have time to work on it myself; I'd welcome a PR though! If you do decide to pursue it, perhaps it would be ideal to open an issue / draft PR early on while working on it, so that other users can discuss their needs before investing large amounts of time in it. Out of curiosity, have you any thoughts on using `tokio`'s equivalents (`tokio::io::AsyncRead`, etc)?

Re: Suggestion 4

Yep, I intend to submit a PR to remove the length prefixes soon.

Re: Suggestion 5

Click to open
I'm not sure if I understand this suggestion correctly, but currently, the module root directly exposes: - `deserialize_from_slice()` - `serialize_into_slice()` - `serialize_into_vec()` If I'm not wrong, `deserialize_from_slice()` is the same as the proposed `from_slice()`. Meanwhile, `serialize_into_vec()` is almost the same as the proposed `into_vec()`, except that `serialize_into_vec()` returns `Result>`, which seems more in line with `serde`'s `into_vec()` signature than `Vec`. Since a core focus of this library is on speed optimization, I would want to keep `serialize_into_slice()` available too, since it's an option that doesn't require memory allocation.

Re: Suggestion 6

Sorry I'm not very sure what you mean by duplication; could you briefly clarify?

thomaseizinger commented 1 year ago

Sorry for the late reply, this kind of went under the radar :sweat_smile:

As far as I know, MessageInfo is meant to be kinda optional (as in, users can choose to generate MessageInfo implementations for autogenerated structs through a cmd line arg), so unifying all three might be a bit messy. So far, I don't see technical issues unifying MessageRead and MessageWrite, though. Would that make things easier?

It is really about reducing the API surface for the user. All three traits are auto-generated, right? But all three have to imported when you want to interact with the generated code.

Hmm, I assume the focus here is on improving the user interface (as opposed to making the library codebase smaller).

100%. I think the main APIs exposed to the user should be as simple as possible and serve the majority of use cases. Ideally, they only have to deal with concepts they are already familiar with, like std::io::Write.

We might then be able to make Writer, WriterBackend and BytesWriter private. We still need Writer and WriterBackend to exist since they serve to encapsulate protobuf-specific writing functions (i.e. write a varint, write with a tag, etc).

That would be ideal! :)

Not sure how ergonomic the no_std use-case needs to be served but it is likely the less popular one so as long as it is possible, it should be good enough?

I'm not sure if I understand this suggestion correctly, but currently, the module root directly exposes:

It is mostly about following similar naming patterns and avoiding unnecessary "naming". If I am using quick_protobuf, I already know that I'll be serializing / deserializing. The function name doesn't have to say that :) I am a big fan of calling top-level module functions via their crate name. This adds a lot of context to the code without having to look up imports which makes reading and understanding code much easier because it doesn't interrupt your flow. For example:

fn main() {
    let message = quick_protobuf::from_slice::<MyMessage>(&buf).unwrap();
}

Since a core focus of this library is on speed optimization, I would want to keep serialize_into_slice() available too, since it's an option that doesn't require memory allocation.

Sure! But again, we should drop the serialize_ prefix because it is redundant :)