taoensso / nippy

The fastest serialization library for Clojure
https://www.taoensso.com/nippy
Eclipse Public License 1.0
1.04k stars 60 forks source link

[proposal] Try to avoid conflict in custom type definition #50

Closed sunng87 closed 10 years ago

sunng87 commented 10 years ago

Hi Peter,

In our development I found it's very easy to run into a situation that we have nippy custom type id in conflict, especially when the type definition is not in a single source file. It's hard to detect in compile time, and in repl development (the conflict one is not loaded in repl often)

My suggestion is to add some mechanism to avoid this conflict in the library. Currently there are two approaches:

ptaoussanis commented 10 years ago

Hi Sun, let me think about this a bit and get back to you?

Namespaced ids would be difficult since the ids are currently just -ive bytes. Throwing an exception on redef would be possible, but that'd also be a breaking change...

Have you considered maybe defining all your custom Nippy ids as constants in a single namespace? That way you can define your thaw/freeze implementations wherever you like, but you won't need to worry about accidentally reusing an id. Does that make sense?

sunng87 commented 10 years ago

Never mind. This is just a suggestion. I agree with you that in most cases, custom types should be written in a single namespace. But if I have a library using nippy for serialization, currently it's difficult to avoid conflicts with user code.

ptaoussanis commented 10 years ago

Never mind. This is just a suggestion.

Sure, ideas very welcome! :-)

But if I have a library using nippy for serialization, currently it's difficult to avoid conflicts with user code.

So you have a library that defines custom types, and the user may define custom types also? Are you worried about the user overwriting your types, you overwriting the user's types, or both?

sunng87 commented 10 years ago

Both. Any conflicts will lead to exceptions in serialization. In our case, the custom types in my library accidentally overwrote types defined by users. So I was thinking of a namespaced solution I mentioned above.

On Sat 17 May 2014 03:22:43 PM CST, Peter Taoussanis wrote:

Never mind. This is just a suggestion.

Sure, ideas very welcome! :-)

But if I have a library using nippy for serialization, currently
it's difficult to avoid conflicts with user code.

So you have a library that defines custom types, and the user may define custom types also? Are you worried about the user overwriting your types, you overwriting the user's types, or both?

— Reply to this email directly or view it on GitHub https://github.com/ptaoussanis/nippy/issues/50#issuecomment-43400091.

ptaoussanis commented 10 years ago

Okay, understood. Switching to a namespace-able type id would definitely help, but may add a fair bit of overhead that I'd like to try and avoid.

Would like a little time to think over possible solutions to this if that's okay? Will get back to you.

sunng87 commented 10 years ago

Thanks!

On Sat 17 May 2014 04:16:22 PM CST, Peter Taoussanis wrote:

Okay, understood. Switching to a namespace-able type id would definitely help, but may add a fair bit of overhead that I'd like to try and avoid.

Would like a little time to think over possible solutions to this if that's okay? Will get back to you.

— Reply to this email directly or view it on GitHub https://github.com/ptaoussanis/nippy/issues/50#issuecomment-43401024.

ptaoussanis commented 10 years ago

Hi Sun,

Just letting you know that I haven't forgotten about this. Was away on vacation for a few weeks, back now. Still no obvious solution to this, but I absolutely agree that it's an important issue and I'll continue thinking about it.

Will update if I come to any conclusions.

Cheers! :-)

sunng87 commented 10 years ago

Thanks! This is not an urgent issue for me but I think can be helpful with namespaced custom types.

By the way, we are using Nippy in our RPC framework Slacker. It serves hundreds of thousands request everyday and works great. Thanks for the great work!

ptaoussanis commented 10 years ago

Okay, had some thoughts on this.

The byte ids [current design] are useful for storage efficiency + backwards compatibility. But arbitrary (+ namespaced) ids would be useful for cases like yours.

So I'm going to experiment with modding extend-freeze, extend-thaw to support both:

Will try wrap this up for inclusion in an upcoming v2.7.0-alpha1. Welcome any thoughts in the meantime.

Cheers! :-)

UPDATE: An alternative may be to generate a 16bit hash from the arbitrary keyword id. That way there's a constant+bounded overhead to using one, and still plenty of space (16bits) to avoid id collision.

ptaoussanis commented 10 years ago

Okay, 2.7.0-SNAPSHOT is up with modified extend-freeze, extend-thaw. New docstrings follow:

;; extend-freeze
"Extends Nippy to support freezing of a custom type (ideally concrete) with
  given id of form:
    * Keyword        - 2 byte overhead, resistent to id collisions.
    * Byte ∈[1, 128] - no overhead, subject to id collisions.

  (defrecord MyType [data])
  (extend-freeze MyType :foo/my-type [x data-output] ; Keyword id
    (.writeUTF [data-output] (:data x)))
  ;; or
  (extend-freeze MyType 1 [x data-output] ; Byte id
    (.writeUTF [data-output] (:data x)))"

;; extend-thaw
"Extends Nippy to support thawing of a custom type with given id:
  (extend-thaw :foo/my-type [data-input] ; Keyword id
    (->MyType (.readUTF data-input)))
  ;; or
  (extend-thaw 1 [data-input] ; Byte id
    (->MyType (.readUTF data-input)))"
ptaoussanis commented 10 years ago

Closing with the release of v2.7.0-alpha1.

sunng87 commented 10 years ago

:+1:

This is exactly what I need. Especially the keyword id hash and conflicts check. Now we have both customization and byte efficiency. I will adapt 2.7.0-alpha1 soon.