markcox / snappy

Automatically exported from code.google.com/p/snappy
Other
0 stars 0 forks source link

Snappy does not have an API to uncompress into an iovec #68

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago

The snappy API forces one to define a contiguous buffer into which 
decompression is performed. This is suboptimal. Snappy should instead support a 
way to uncompress into an iovec (which provides an array of buffers). 

Here's a typical use case for the above. Say several units of data are 
compressed together. If we now need to retrieve one of these, we need to 
provide a contiguous data buffer for the decompression. Once the decompression 
is performed, one would like to discard the memory that's not needed, and only 
hold on to that portion of the buffer's memory that contains the unit of data 
we're interested in. However, without performing a buffer copy, this is not 
possible. Had snappy supported an iovec, one could've provided several buffer 
in an iovec - each of say 4KB. Then one could've discarded the buffers that 
were not needed and only held on to those 4KB buffers that span the unit of 
data we're interested in.

What version of the product are you using? On what operating system?

1.0.5.

Please provide any additional information below.

Original issue reported on code.google.com by mohit.a...@gmail.com on 13 Dec 2012 at 7:29

GoogleCodeExporter commented 9 years ago
Hi,

The Snappy API doesn't really force you to do any such thing. You can write 
your own sink which decompresses into an iovec if you so desire; although only 
UncheckedByteArraySink is included in the distribution, you can use whatever 
sink you want.

Also, see https://github.com/andikleen/snappy-c/blob/master/snappy.h -- Andi 
Kleen's C port supports compression and decompression to/from iovec.

Original comment by se...@google.com on 13 Dec 2012 at 10:40

GoogleCodeExporter commented 9 years ago

The API in snappy.h only allows you to use a sink with Compress(). It does not 
allow you to use a sink with Uncompress().

So please help me understand how I can write a sink that works with Uncompress.

Original comment by mohit.a...@gmail.com on 13 Dec 2012 at 4:31

GoogleCodeExporter commented 9 years ago
Sorry, I was using the wrong word; you don't want a Sink here, you want a 
Writer. I also see that due to the templated nature, you cannot make new 
Writers without modifying snappy.cc.

Still, the point stands, in that you can use Andi Kleen's C port if you want 
this.

Note that decompressing to an iovec is bound to be less efficient than to an 
array, since resolving backreferences across buffers is more tricky. If you 
want just a 4 kB chunk in the middle, I doubt you can get much more efficient 
than simply decompressing to an array and memcpy-ing out those four kilobytes. 
The primary use case for such a thing would probably not be efficiency, but 
ease of allocation in a kernel context.

Original comment by se...@google.com on 14 Dec 2012 at 8:55

GoogleCodeExporter commented 9 years ago

I don't wish to copy out just 4K. Say I'm compressing data in 256KB units. And 
I may be interested in only 64K somewhere in the middle after decompression. It 
is also possible that I may be interested in all 256KB up front, but then divvy 
up my buffer into different request units that are processed separately. All 
these require that after decompression, I be able to put them in smaller units 
so that if any of these units is not referred to anymore, they can be deleted 
without hoarding up memory. BTW I used to be a Googler working for the GFS team 
for 5 yrs - the use of DataBuffer object in Google would be an example for this.

Looking at snappy.cc, I understand why Sink is not provided in the uncompress 
API - we're worried about the virtual function ovhd there. The alternative 
would be to support an additional API on snappy.h - something like:

bool RawUncompress(Source *compressed, const struct iovec *iov, int iovcnt);

Then one can implement a SnappyIOvecWriter class inside snappy.cc (analogous to 
the SnappyArrayWriter class already there). That'll eliminate the ovhd due to 
virtual functions and yet provide a 'scatter' interface.

If you like, I can contribute code on this for consideration for inclusion in 
future snappy releases.

I'll look at Andi Kleen's C port, but a brief look indicates that a complete C 
port seems unnecessary for this. Also, I'd rather pick up a 'blessed' 
implementation as we're using snappy in a production setting. 

Original comment by mohit.a...@gmail.com on 14 Dec 2012 at 5:04

GoogleCodeExporter commented 9 years ago
Seems like my answer didn't hit the bug; pasting the relevant part:

If you want to include an IOVecWriter, I'll be happy to consider it
for consideration. It sounds like a reasonable way to solve this.

Original comment by se...@google.com on 20 Dec 2012 at 1:34

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Thanks. I've written the code - will test it and let it bake for sometime to 
ensure its bug free before I submit it (within a few weeks).

Original comment by mohit.a...@gmail.com on 21 Dec 2012 at 8:29

GoogleCodeExporter commented 9 years ago
Hi,

Did you ever get anywhere here?

Original comment by se...@google.com on 20 Feb 2013 at 3:24

GoogleCodeExporter commented 9 years ago
Sorry - I quit my last company and so didn't have access to the code I'd 
written earlier. Let me write it again and then send it to you.

Original comment by mohit.a...@gmail.com on 20 Feb 2013 at 4:18

GoogleCodeExporter commented 9 years ago

I've modified 3 files in the latest 1.1.0 release. These files are attached - 
these are snappy.h, snappy.cc and snappy_unittest.cc.

Original comment by mohit.a...@gmail.com on 21 Feb 2013 at 8:39

Attachments:

GoogleCodeExporter commented 9 years ago
Hi,

Please sign the Contributor License Agreement 
(http://code.google.com/legal/individual-cla-v1.0.html); it's a prerequisite 
for accepting non-trivial patches. It's a really simple procedure, though.

We'll take it from there.

Original comment by se...@google.com on 22 Feb 2013 at 12:53

GoogleCodeExporter commented 9 years ago

I've signed and submitted the contributor license agreement.

Original comment by mohit.a...@gmail.com on 24 Feb 2013 at 1:28

GoogleCodeExporter commented 9 years ago
Thanks.

I'm looking through the code and making some cleanups, and I need to figure out 
a few things: First, what is the relationship between expected_written_ and the 
space in the iovecs? I presume that it's okay to have space in the iovec that 
goes beyond what's set by expected_written_? However, if there is room in the 
iovec, are we allowed to write junk data past the end of expected_written_? (If 
so, the space_left >= 16 test in TryFastAppend is probably unneeded.)

Do you intend to add benchmarks?

Original comment by se...@google.com on 25 Feb 2013 at 12:56

GoogleCodeExporter commented 9 years ago
Another question: Are there situations where iov_len can be 0? If so, Append() 
is broken. (If not, I'll add an assert.)

Original comment by se...@google.com on 25 Feb 2013 at 1:42

GoogleCodeExporter commented 9 years ago
Ah, no, Append() is not broken. I'd still like to know if iov_len can be 0, 
though.

Original comment by se...@google.com on 25 Feb 2013 at 1:43

GoogleCodeExporter commented 9 years ago

> I'm looking through the code and making some cleanups, and I need to figure 
out
> a few things: First, what is the relationship between expected_written_ and 
the 
> space in the iovecs? I presume that it's okay to have space in the iovec that 
goes 
> beyond what's set by expected_written_? 

Yes, it is ok to have expected_written_ to be smaller than the cumulative space 
in the iovecs. the base_iov_cnt_ is essentially there for assertion purposes - 
otherwise, the code is modeled after SnappyArrayWriter.

> However, if there is room in the iovec, are we allowed to write junk data 
past the 
> end of expected_written_? (If so, the space_left >= 16 test in TryFastAppend 
is 
> probably unneeded.)

If we leave out 'space_left >= 16', then we'll return true when we should be 
returning false. That is, while it is ok to write the data without worrying 
about 
memory corruption, we should really be returning false since we're writing more 
than
what expected_written_ allows.

> Ah, no, Append() is not broken. I'd still like to know if iov_len can be 0, 
though.

Yes, it should be ok to have some iovecs in there for which the iov_len is 0.

> Do you intend to add benchmarks?

I'd need to understand the benchmarking framework in snappy_unittest.cc first - 
was being a bit lazy here. If it is simple to add benchmarks, I'd appreciate if 
you
can add one - or if you insist, I can learn this stuff and provide an update 
with
a benchmark.

BTW, I noticed that in snappy_unittest.cc, line 513 should be 'delete[] buf;' 
and not
'delete buf'.

Original comment by mohit.a...@gmail.com on 25 Feb 2013 at 2:17

GoogleCodeExporter commented 9 years ago
OK, so expected_written_ is really a hard limit, not just an expectation? If 
so, it should probably be renamed; I'm some of these cleanups right now. 
(heapchecker already found the delete[] vs. delete, so I already fixed that.)

In general, I find the code in AppendFromSelf pretty complex and scary, yet the 
unit test is very thin. I don't think this really belongs in VerifyString(); 
perhaps there should be a separate test that's deliberately testing some of the 
edge cases we can get here? In particular, the source and destination iovecs 
can go from non-overlapping to overlapping to non-overlapping again, and 
although I _think_ your logic is right, I'd much rather have it under test than 
just looking hard at it.

Original comment by se...@google.com on 25 Feb 2013 at 2:37

GoogleCodeExporter commented 9 years ago
Hi,

What was the conclusion here? I don't think this code can go in unless someone 
provides more exhaustive unit tests.

Original comment by se...@google.com on 21 Mar 2013 at 1:35

GoogleCodeExporter commented 9 years ago

I seem to have mis-understood your comment from Feb 25th. It seemed like you 
were fixing the code, and it made me think you're also enhancing the unittest. 

I can work on the unittest as per your comments. However, I'm out of the 
country for the next 2 weeks - so I can get to it by mid April or so.

Original comment by mohit.a...@gmail.com on 21 Mar 2013 at 7:53

GoogleCodeExporter commented 9 years ago

I've updated the unittest. Please take a look. I've also updated the interface 
definitions in snappy.h to indicate that the individual buffers in an iovec 
shouldn't overlap. I have not made any change to snappy.cc - so I'm not 
attaching that in this post.

Original comment by mohit.a...@gmail.com on 6 May 2013 at 4:21

Attachments:

GoogleCodeExporter commented 9 years ago
Hi,

Sorry for letting this slip for so long. I'll try to squeeze in some time for 
looking at this code later today; there is already consensus internally that 
this would be a good thing to have, although we'd need to figure out something 
for Windows.

Original comment by se...@google.com on 12 Jun 2013 at 3:42

GoogleCodeExporter commented 9 years ago
Hi again,

First of all, please try to send patches as patches, not full files. It's very 
hard to keep track of things that way.

Second, my comment about the unit test was not about the iov pointers 
overlapping, it was about your logic for from_iov_index != curr_iov_index_. 
This is, as far as I can see, still largely untested (especially the 
transitions between the different states), and it makes me worried. I see you 
have added a benchmark, and that is fine, but apart from that I do not find the 
tests to be substantially more covering than the ones from the previous patch. 
(Also, I need to figure out what to do about Windows compatibility.)

I might find some time to write better tests for the corner cases at some 
point, but things are a bit tight at the moment.

Original comment by sgunder...@bigfoot.com on 12 Jun 2013 at 9:12

GoogleCodeExporter commented 9 years ago
(This time from the right account as well. Multilogin keeps being hard...)

OK, I've written unit tests that I'm sort-of happy with. I need to test things 
on Windows, and also of course send for review. But you're off the hook for 
now. :-)

Original comment by se...@google.com on 13 Jun 2013 at 12:05

GoogleCodeExporter commented 9 years ago

Original comment by se...@google.com on 13 Jun 2013 at 10:51

GoogleCodeExporter commented 9 years ago
Fixed in r76. Thanks for your patience.

Original comment by se...@google.com on 13 Jun 2013 at 4:20

GoogleCodeExporter commented 9 years ago

Thanks for taking care of the remaining loose ends. I'm on vacation right now, 
and so only sporadically checking emails.

Original comment by a...@cohesity.com on 13 Jun 2013 at 5:33

GoogleCodeExporter commented 9 years ago
Just wondering when the next snappy release is coming out - hopefully with this 
change included.

Original comment by a...@cohesity.com on 23 Sep 2013 at 5:27

GoogleCodeExporter commented 9 years ago
I haven't really considered a release; we've tended to release when there's 
been serious bugfixes or when we had a usable performance improvement, and 
neither has really been happening recently. But if having iovec support in a 
release is important for you, I guess we could put out an 1.1.1 soon...

Original comment by se...@google.com on 30 Sep 2013 at 12:52

GoogleCodeExporter commented 9 years ago
That'd be great. Thanks.

Original comment by mohit.a...@gmail.com on 30 Sep 2013 at 2:29