taoensso / nippy

The fastest serialization library for Clojure
https://www.taoensso.com/nippy
Eclipse Public License 1.0
1.04k stars 60 forks source link

Byte Encoding #42

Closed kul closed 10 years ago

kul commented 10 years ago

What is the default encoding for bytes with nippy?

I get following errors when working with this API which assumes UTF-8.

user=> (import 'org.apache.hadoop.hbase.util.Bytes)
org.apache.hadoop.hbase.util.Bytes
user=> (n/thaw (Bytes/toBytes (Bytes/toString (n/freeze n/stress-data))))

CorruptionException last byte of compressed length int has high bit set  org.iq80.snappy.SnappyDecompressor.readUncompressedLength (SnappyDecompressor.java:425)
user=> (n/thaw (Bytes/toBytes (Bytes/toString (n/freeze n/stress-data {:compressor nil}))))

Exception No reader provided for custom type ID: 65  taoensso.nippy/thaw-from-stream (nippy.clj:365)
user=> (n/thaw (Bytes/toBytes (Bytes/toString (n/freeze {:a 1 :b "2" :c 0.1}))))

CorruptionException Invalid copy offset for opcode starting at 27  org.iq80.snappy.SnappyDecompressor.decompressAllTags (SnappyDecompressor.java:165)
user=> (n/thaw (Bytes/toBytes (Bytes/toString (n/freeze {:a 1 :b "2"}))))
{:a 1, :b "2"}
user=> (n/thaw (Bytes/toBytes (Bytes/toString (n/freeze {:a 1 :b "2" :c 9}))))
{:a 1, :c 9, :b "2"}
user=> (n/thaw (Bytes/toBytes (Bytes/toString (n/freeze {:a 1 :b "2" :c 9.1}))))
{:a 1, :c 9.1, :b "2"}
kul commented 10 years ago

I guess this would be equivalent to (.getBytes (String. ....)) and every binary data can not be represented in utf-8. Closing this.

ptaoussanis commented 10 years ago

Hi Kul, could you describe a little what you're actually trying to do?

Nippy's freeze fn returns a Java byte array. Bytes themselves aren't "encoded". Encodings come in to play when you're transforming something to/from a byte array and there may be more than one way of doing the transformation.

A String is a good example:

"hello" ; String with some kind of character encoding (usu. UTF-8)
(.getBytes "hello" "UTF-8") ; The byte form of our UTF-8 String
(String. (.getBytes "hello" "UTF-8") "UTF-8") ; Rebuilding the original String

The Bytes/toString method that you're calling seems to be designed to operate on the byte form of a UTF-8 String. Not on an arbitrary byte array (like the kind Nippy's freeze fn will return).

Probably you won't need/want to use these byte utilities when operating with Nippy, but that might depend on what you're trying to do exactly.

Does that make sense?

kul commented 10 years ago

Yes makes perfect sense. Sorry for this.

Thanks

ptaoussanis commented 10 years ago

No problem :-) Cheers!