Open valyala opened 8 years ago
It should be fairly pluggable. The current deduplication only has a memory storage, so it will take up memory on the receiving side. The sender only stores hashes of blocks, so that is approx 20-30 bytes/block.
Writing:
if writeBufferSize <= 0 {
writeBufferSize = DefaultWriteBufferSize
}
if deduplication {
// Dynamic blocks with average block size of 1KB (4KB is max).
// Receiver can use *up to* 1GB of RAM, average will be 250MB though.
w, err := dedup.NewStreamWriter(w, dedup.ModeDynamic, 4*1024, 1 << 30)
// handle err
defer w.Close()
}
bw := bufio.NewWriterSize(w, writeBufferSize)
You can use Split function to manually split blocks which will also flush to the writer below.
Reading:
if readBufferSize <= 0 {
readBufferSize = DefaultReadBufferSize
}
if deduplication {
// no magic - but note it will block until it can read a few bytes.
r, err := dedup.NewStreamReader(r)
// handle err
defer(r.Close)
}
br := bufio.NewReaderSize(r, readBufferSize)
Thanks! Will experiment with deduplication in spare time
I think the main issue is dealing with latency and flushing at the right times, so you don't get responses that are hanging in a buffer somewhere.
Also this is straight up deduplication, it could of course be more "content-aware", so it stores documents on disk and only sends deltas. However, that is way more work in terms of synchronizing sender/receiver, since the receiver needs to communicate what is has, and keep it in sync with the sender. This is way easier, since a new connection will reset the "fragment cache".
Smart http-aware deduplication could give much better compression ratio (and, probably, speed) comparing to general-purpose compression algorithms such as gzip or snappy. See this blog post from Cloudflare as a real-world example. It would be great to have a CompressType for http-aware deduplication in
httpteleport
. @klauspost, could you look into this?