google / open-vcdiff

An encoder/decoder for the VCDIFF (RFC3284) format
Apache License 2.0
186 stars 52 forks source link

API should use void*, not char*, for binary data #4

Open Steelskin opened 9 years ago

Steelskin commented 9 years ago

Original issue 4 created by Jens.Alfke on 2008-09-05T20:28:44.000Z:

Several API methods take parameters that are pointers to arbitrary binary data. These parameters are typed as "const char*". Because of this, clients of the API that have data stored in other types, have to static_cast their pointers to pass them to vcdiff.

Instead, vcdiff's API should use "const void" instead of "const char". This permits any const pointer to be passed in without a type-cast, and also makes it clear to the human reader that the method takes arbitrary data, not just C strings. This is the usual convention in C/C++ APIs nowadays.

As a specific example, change vcencoder.h:70 from: HashedDictionary(const char* dictionary_contents, size_t dictionary_size); to: HashedDictionary(const void* dictionary_contents, size_t dictionary_size); (There are other instances of this at other points in the API.)

Steelskin commented 9 years ago

Summary of old comments:

Thanks much for your interest in open-vcdiff, and for sharing your feedback.

I think the use of const char* versus const void* is debatable.

Passing a const char* for dictionary_contents specifies unambiguously the units in which dictionary_size is expressed: the dictionary contains dictionary_size char values (bytes.) One of my colleagues pointed out that this has the advantage that a pointer to the end of the dictionary can be found by calculating dictionary_contents + dictionary_size; such a calculation would not work with a void* and would require an explicit cast.

On the other hand, standard functions such as memcmp and memcpy take void* arguments to represent pointers to contiguous bytes. The existence of those interfaces supports your position.

As another reference point from the compression milieu, zlib uses Bytef* for its input and output, which is a typedef for unsigned char*.

Would anyone else care to offer an opinion on this issue?

Answer:

  • The type 'size_t' already unambiguously denotes a byte count.
  • Your point about offsetting from the dictionary pointer relates to the library implementation; but the design of an API should give the convenience of clients higher priority (unless it would severely penalize the implementation, of course.) So I don't think it's a good balance to require clients to cast the pointers, just to save a few casts in the implementation. [Also, FYI, gcc allows pointer arithmetic on void* for exactly this reason.]

My personal opinion is that the API should be changed to use void*

bacek commented 8 years ago

Just my $0.02: I prefer uint8_t* for byte buffers.