Open bmatsuo opened 10 years ago
I could be entirely off base here, but I suspect the most common use case is to use this package on long-lived streams of data where you wouldn't benefit as much from re-use.
So I'm not fully convinced this is worthwhile except on these synthetic benchmarks.
Thoughts?
Hrm. I'm not sure I agree. Though I'm not entirely sure what you mean by long-lived. I don't know how nsq uses (used?) this package. Now you have me thinking that it just has constantly open tcp connections to other nodes that are wrapped with a snappy encoder for their entire life... Is that more or less correct?
My use case has centered more around HTTP APIs (though not involving the browser of course) as well as file compression. I think the format is quite attractive as an alternative to gzip in these domains especially when long term storage costs are not a concern. I wouldn't really call these cases of long-lived streams; at most they are minutes but often less than a second.
I really just wanted to gauge interest. I'm pretty busy so I'll probably just back-burner the idea for a bit.
Thanks for the input.
Yes, I mean that you have some sort of client that isn't req/resp based and you want to compress the data over the connection.
Granted, that's my use case, so I'm obviously biased.
I do acknowledge that other use cases would benefit from Reset()
, so I'm not against introducing it if we can figure out a nice API. I agree that sync.Pool
should be something external and not inside the package.
Figured I would chime back in here with an update.
I still don't have a very good way of exposing a Reset API in this package. Although there is one I didn't mention before.
I've noticed there are some 'implicit' interfaces in the std library (the idea of the interface exists but there is no type declaration that defines it). A concrete example of this is the CancelRequest()
method of a http.RoundTripper. The interface is only described by doc comments for the Client struct.
So, something like that could be done in this package. Just describe the 'Resetter' interfaces and say that you can use them. It's still a little weird because godoc would hide the signature on the reader/writer types (unlike net/http where you clearly see the signature you want on the http.Transport struct). I'm not crazy about the idea. But it would at least let people opt into resets with only mild boilerplate (defining the interface somewhere and applying type assertions).
For the time being, because I have an application that benefits from resets, I put the Reset methods on a fork that has an incompatible API. Though I still intend of pushing all compatible work through this package first (like my last bug fix). I just wanted to let you know I've done it.
What about just requiring type assertions for Reader
and Writer
- i.e.:
rdr := snappystream.NewReader(r, true)
rdr.(snappystream.Reseter).Reset()
I suppose returning concrete types is really the best choice here and it wouldn't necessary break compatibility for implementations that are passing around Reader
/Writer
interfaces...
Defining interfaces is an option. Though two interfaces would need to be defined, ReadResetter
and WriteResetter
. I wasn't sure if your second comment was an acknowledgement of that.
I imagine that most usages would continue to compile if concrete types were returned. Though due to composability of io operations it's not hard to imagine breaking cases may exist. Something like
r := snappystream.NewReader(&buf, true)
r = io.TeeReader(r, md5hash)
I don't think that's too contrived. Though, as I said, I doubt it's the norm.
I do wish the Go community could come up with a way to deal with trivial incompatibilities like that example. We do have a code rewriting tool...
I think that most mature Go projects use some form of version pinned dependencies these days, so let's just move forward with returning concrete types even if it breaks a minority of edge cases.
Let's take this opportunity to make any other breaking changes that might be useful as well, if you had some in mind.
... we can then go ahead and stamp this with a v1.0
tag, etc.
That sounds pretty reasonable to me, @mreiferson. I think this is good time to do such a thing. At this point my list of desired changes is pretty static. I've had a couple months to think about it and toy around on my branch. I'm satisfied with what I have in there as a long-term API.
Here's a summary of the breaking changes I made on my branch.
writer
type that is defined in this package still exists, but it is not exported and there's no way for the importer to instantiate one directly.The last change is pretty severe and I'm not sure you would want to pull it in here. As I believe we discussed before, an uninformed upgrade would cause corrupted streams if the user did not add appropriate Flush()
or Close()
calls.
But I think the other two changes quite reasonable and would definitely like them to be considered.
Let me know what you think.
I think these are fine, let's do it.
Cool. I have a busy week but I'll try to get some pull requests up in the next day or two.
Just to be clear.. you don't want me to include that last bullet point, correct? More explicitly, your vision is that in v1.0 there will be three types: Reader, Writer, and BufferedWriter?
No, I think we can merge BufferedWriter
and Writer
to match the api of the gzip
stdlib package...
@mreiferson what is your opinion about adding Reset methods to the types in this package? Other buffered IO types like those in "bufio" and "compress/gzip" supply such a method. I have a branch, reset-methods, in which I've made such the addition and the benchmark results improve to an extreme degree in some cases (> 2x speedup). However problems come up regarding how to expose the methods and maintain compatibility given that the types in this package are not exported.
The Reset interface isn't standardized. Specifically, the Reset method on gzip.Reader returns an error but the Reset method on bufio.Reader does not (neither of the Writer types return an error on Reset). In this package, neither the writers nor the reader need to return an error on Reset.
One of the only compatible and easy-to-use designs I see would be the introduction special types that wrap sync.Pool and return io.ReaderClosers and io.WriterClosers. I'm not even sure if I think it's a good idea though. IMO it would be better if we could somehow (edit: leave) sync.Pool out of this package and leave the sharing/pooling up to package importers.