Support zero-copy (streaming).

peterbourgon / diskv

A disk-backed key-value store.

http://godoc.org/github.com/peterbourgon/diskv

MIT License

1.41k stars 102 forks source link

Support zero-copy (streaming). #2

Closed alecthomas closed 11 years ago

alecthomas commented 11 years ago

This is a requirement for supporting very large blobs.

peterbourgon commented 11 years ago

Thanks for the PR! In general I'm strong +1 on using io.{Reader,Writer} in the general case. But I feel like things can boil down even more compactly: seems like {Read,Write}Stream can take care of all the dirty work, and {Read,Write} can just be helper functions on top of those. Also, maybe{Read,Write}Compressed could probably get a better name, too. Any objections if I rework a bit of this?

alecthomas commented 11 years ago

Not at all. I took some shortcuts to get to where I want to be, so a cleaner solution would be great.

peterbourgon commented 11 years ago

Would you mind taking a look (and maybe testing) the "streamwise" branch? I think it's basically feature-complete at this point.

alecthomas commented 11 years ago

I get a nil pointer dereference on the second attempt to do a ReadStream. The first completes successfully.

2013/03/02 18:15:05 http: panic serving 127.0.0.1:56497: runtime error: invalid memory address or nil pointer dereference
goroutine 16 [running]:
net/http.func·006()
    /usr/local/Cellar/go/HEAD/src/pkg/net/http/server.go:799 +0xac
github.com/peterbourgon/diskv.(*Diskv).ReadStream(0xc200133000, 0xc200194b4f, 0xb, 0x0, 0x0, ...)
    /Users/alec/.go/src/github.com/peterbourgon/diskv/diskv.go:207 +0x147

The surrounding code is essentially the same, adjusted for the API changes.

Also, if you could replicate this changeset's logic it would be appreciated. Without this, io.Copy to HTTP ResponseWriter will flush the HTTP header out on first write, then error. This makes it impossible to alter the headers or status code on error. Quite annoying actually, but there you have it.

peterbourgon commented 11 years ago

Sorry about the nil pointer. Total brainfart. Fixed.

As for the other thing, I can easily add that code to 'read', but I'm having trouble coming up with a test case that's broken with the current implementation. I think because my API is different than yours: my ReadStream returns (io.ReadCloser, error) rather than accepting an io.Writer. So, if you attempt to ReadStream a key that resolves to a directory rather than a file, you'll get an error before you get a chance to do the io.Copy into your http.ResponseWriter.

Can you confirm my hunch that you don't need that logic replicated? If not, can you maybe describe a test case that would fail without it, given the current API?

Huge thanks for your patience and thoughts.

alecthomas commented 11 years ago

nil pointer fixed, thanks!

os.Open on a directory doesn't return an error (because it is a valid operation). It's the subsequent initial read that returns an error, but by that stage the http.ResponseWriter has already flushed the headers and you can no longer change any of the headers.

You can see that logic on line 455 of ResponseWriter here.

Here is some code, from a request handler, that reproduces it:

w.Header().Add("Content-Type", "application/octet-stream")
r, err := b.dv.ReadStream(path)
if err == nil {
  _, err = io.Copy(w, r)
}
if err != nil {
  if os.IsNotExist(err) {
    http.Error(w, http.StatusText(http.StatusNotFound), http.StatusNotFound)
  } else {
    http.Error(w, err.Error(), 500)
  }
}

I realise this is a bit out of scope, but one could argue that returning a readable stream for a directory is not a valid operation anyway I guess :)

peterbourgon commented 11 years ago

I understand now. Thanks for the explanation. Fixed, and test added; please confirm it's working OK for you?

This use case implies a pathological property of your TransformFunc that is dangerous: one valid key shouldn't Transform to a subset of another valid key. That is, you shouldn't be able to construct valid keys that resolve to directories. As a concrete example, if your TransformFunc splits on every 3 characters, then

d.Write("abcabc", val) // OK: written to <base>/abc/abc/abcabc
d.Write("abc", val) // Error: attempted write to <base>/abc/abc, but it's already a directory

I'll try to summarize this warning for the README.

peterbourgon commented 11 years ago

Resolved.