mwotton / lz4hs

Haskell bindings to lz4
BSD 3-Clause "New" or "Revised" License
21 stars 7 forks source link

Lots of changes, should collaborate #2

Closed thoughtpolice closed 11 years ago

thoughtpolice commented 12 years ago

I basically wrote an entire LZ4 binding myself before discovering this one. I'd rather have one great package than two separate ones with separate maintenance burdens. But my package is almost a complete rewrite. I have a branch of this repository with all my changes, and the commit message gives the highlights:

commit 8e6bcd238f42ab4cdd2b045c47c86ad43ab5d6fb
Author: Austin Seipp <mad.one@gmail.com>
Date:   Tue Aug 14 19:57:29 2012 -0500

    Basically rewrite everything. Many changes detailed.

    I had my own copy of an LZ4 binding around that was all but finished
    when I found this one. I would rather there be one great binding instead
    of just duplicating the work and having two seperate ones. As a result,
    I just imported all of my code and touched it up basically.

    Big changes include:

     * LZ4 is now included inline in the package. It's at revision r75.

     * The interface is completely type safe and returns Maybe values in
       case of failed compression.

     * The whole API has a property: any empty input string for
       compression OR decompression returns 'Just Data.ByteString.empty'
       as a short cut.

     * Top-level documentation of everything.

     * Benchmarks comparing LZ4, QuickLZ, and snappy (need improvement)

     * Rewrote hspec tests a bit

     * Also offers a composition of (compress >>= compressHC) which gives
       better compression ratios at the expense of more time, but not
       nearly as much time as HC mode alone. This is due to repetition in
       the output coding which can be compressed again. See
       'compressPlusHC' documentation for more.

     * Full new README, an AUTHORS file.

     * Add travis-ci support

    Overall though, it's everything you'd expect.

    Signed-off-by: Austin Seipp <mad.one@gmail.com>

Overall I think these bindings are much nicer, and I was wondering what your preference was on merging this (which basically rewrites everything) or just to release my own package? I have both of these routes working so I figured I'd just ask what your opinion on the matter was and if you'd like, I can shoot you a pull request or upload a new version.

mwotton commented 12 years ago

Ah, nice. It was pretty much a weekend finger exercise for me - I kept the interface to the existing gzip bindings for the sake of easy compatibility, but arguably a nicer design would be to add a very thin shim called something like lz4-compat which just calls fromJust, just to be able to do drop-in compatibility.

I'm not averse to pulling in your changes, but I'd like to see them first :) I agree that there's very little point having two competing implementations polluting the namespace on hackage.

thoughtpolice commented 12 years ago

I don't think keeping the gzip binding interface is the right thing to do. In their case, the bindings are actually lazy because gzip/bzlib has a streaming interface and this can be offered without large overheads when streaming files around. Failure in this case via error is ugly but there's not much you can do because checksums aren't checked until the end of a stream anyway. And you can't inspect the result to construct the sum type. But they can afford this interface due to the compressors' natural design and a lazy interface is very useful.

On the other hand, lz4 is purely block-based as a compression method, and as a result the interface is strict, because it's meaningless to make it lazy. I also think the programmer should be forced to handle the decoding error as a result, because we can afford to. The interface forms a monad anyway as a result, so it should be quite suitable for error handling a la the errors package for example or just any other transformer. But compression is often a sparse operation anyway in terms of code locality - there will be very few places where you'll call compress in most code, so it's not a burden API wise for most cases - you probably won't even need the Monad instance (other than to write pretty test cases.) And you want to know when it fails as an operation (if you were decompressing say, a block of filesystem data, you'd surely want to know that it's OK without the error call shooting your program in the head at the time you evaluate something to WHNF.)

Finally I see no reason to keep backwards compatibility with an old 0.1 interface. API changes are covered in the PVP by a version bump to -> 0.2 and there will likely be minimal clients anyway. People can just say oldAPI = fromJust . compress anyway if they're truly evil. It's much better to do it now in the right package then add shims and maintenance headache.

I'll follow up with code in short order.

mwotton commented 12 years ago

On Fri, Aug 17, 2012 at 11:15 AM, Austin Seipp notifications@github.comwrote:

I don't think keeping the gzip binding interface is the right thing to do. In their case, the bindings are actually lazy because gzip/bzlib has a streaming interface and this can be offered without large overheads when streaming files around. Failure in this case via error is ugly but there's not much you can do because checksums aren't checked until the end of a stream anyway. And you can't inspect the result to construct the sum type.

On the other hand, lz4 is purely block-based as a compression method, and as a result the interface is strict, because it's meaningless to make it lazy. I also think the programmer should be forced to handle the decoding error as a result, because we can afford to. The interface forms a monad anyway as a result, so it should be quite suitable for error handling a la the errors package for example or just any other transformer.

Fair enough.

Finally I see no reason to keep backwards compatibility with an old 0.1 interface. API changes are covered by a version bump to -> 0.2 and there will likely be minimal clients anyway. It's much better to do it now in the right package than add shims.

I meant compatibility with gzip, not 0.1.

I'll follow up with code in short order.

cheers mark

A UNIX signature isn't a return address, it's the ASCII equivalent of a black velvet clown painting. It's a rectangle of carets surrounding a quote from a literary giant of weeniedom like Heinlein or Dr. Who. -- Chris Maeda