taoensso / nippy

The fastest serialization library for Clojure
https://www.taoensso.com/nippy
Eclipse Public License 1.0
1.04k stars 60 forks source link

Have you thought about implementing assoc/dissoc/into on top of the byte[] representation? #77

Closed mattiasw2 closed 8 years ago

mattiasw2 commented 8 years ago

One problem with Java is that objects take so much heap space, see

http://www.slideshare.net/c24tech/vjug-getting-c-c-performance-out-of-java

If you want big in-memory caches (like memcache but in-process), byte[] is a good solution.

Encoding the data using nippy is one solution, and operations like get, assoc, dissoc, and into can actually be efficiently implemented on top of nippy's internal representation, i.e without deserializing first. For more advanced operations like , the fall-back is to convert the complete structure to clojure, and use the normal operations.

Is it doable? Complicated?

ptaoussanis commented 8 years ago

Hi Mattias,

For more advanced operations like ,

Sorry- think there's a typo here: operations like what?

Is it doable? Complicated?

Difficult to answer since there's so many details that'd need to be discussed. How doable/complicated it is will depend on what your precise needs are. First question I'd consider: would it be worth trying? What specific objectives are you trying to hit + what costs would you be prepared to incur?

Nippy's byte data format is simple, working at the byte level wouldn't be hard in simple cases. So in principle you have options if you decide it'd be worth pursuing for your case.

I would note that neither Java nor Clojure are particularly well suited to memory constrained environments. Might be best to stick to C if you need C-like memory characteristics?

mattiasw2 commented 8 years ago

Let us assume the top level data structure for nippy is a map.

Then, I would like to implement the clojure operation get, assoc, dissoc, and into on top of it.

ptaoussanis commented 8 years ago

Sorry, could I clarify what you're asking exactly? :-)

mattiasw2 commented 8 years ago

I want to do this, without thaw being called

(:boolean (nippy/freeze nippy/stress-data))) => true

(:string-utf8 (nippy/freeze nippy/stress-data))) => "ಬಾ ಇಲ್ಲಿ ಸಂಭವಿಸ"
ptaoussanis commented 8 years ago

Sorry, could you possibly phrase what you're asking as a literal question? You're asking if this is hard? Possible?

As I mentioned, may be possible; difficulty will depend on your implementation and the tradeoffs you'd want.

mattiasw2 commented 8 years ago

Sorry for being unclear. My question was if it is easy to find a specific key and its data in the byte[]?

Is it a simple scan from start in the byte[]?

If the top-level-map contains a sub-map, can it easily be skipped, or is it skipped recursively?

My question was an initial query, I didn't expect an exact answer more like something like: "Yes, see ????", or "Possible, you need to ....", or "No, no point, will be as expensive as a complete thaw"

ptaoussanis commented 8 years ago

Is it a simple scan from start in the byte[]?

The Nippy format is defined at https://github.com/ptaoussanis/nippy/blob/master/src/taoensso/nippy.clj#L25

It wasn't designed with random byte access in mind, but it'd be possible to interpret+skip bytes to get to the desired point in the byte array.

Whether this'd be easy or difficult will depend on things like the data types you want to support scanning into, how much nesting you want to support, how well structured your data tends to be, what kinds of modifications you want to support, etc.. A full discussion would depend entirely on your use case, goals, desired tradeoffs, etc. That'd be well beyond the scope of what we could get into here.

Again, I'd encourage you to first consider whether you actually need/want this feature- what the benefits would be, and what the tradeoffs would need to be. For example, what's the motivation of using Nippy here at all vs just using a custom byte-level format designed for your use case?

Similarly, I'd suggest that if the heap impact of a thaw is so problematic for your use case, it seems likely you'd want to avoid both Clojure and the JVM?

Hope some of this was useful. Best of luck, cheers! :-)