peter-lawrey / HugeCollections-OLD

Huge Collections for Java using efficient off heap storage
274 stars 51 forks source link

custom marshalling/serialization #24

Closed RuedigerMoeller closed 10 years ago

RuedigerMoeller commented 10 years ago

Hi Peter,

As we have hundreds of datastructures, we go down the serialization route. I need a way to efficiently plug in a custom serializer.

checking AbstractBytes

public void writeObject(@Nullable Object obj) {
        if (obj == null) {
            writeByte(NULL);
            return;
        }

        Class<?> clazz = obj.getClass();
        final BytesMarshallerFactory bytesMarshallerFactory = bytesMarshallerFactory();
        BytesMarshaller em = bytesMarshallerFactory.acquireMarshaller(clazz, false);
        if (em == NoMarshaller.INSTANCE && autoGenerateMarshaller(obj))
            em = bytesMarshallerFactory.acquireMarshaller(clazz, true);

        if (em != NoMarshaller.INSTANCE) {
            if (em instanceof CompactBytesMarshaller) {
                writeByte(((CompactBytesMarshaller) em).code());
                em.write(this, obj);
                return;
            }
            writeByte(ENUMED);
            writeEnum(clazz);
            em.write(this, obj);
            return;
        }
        writeByte(SERIALIZED);
        // TODO this is the lame implementation, but it works.
        try {
            ObjectOutputStream oos = new ObjectOutputStream(this.outputStream());
            oos.writeObject(obj);
        } catch (IOException e) {
            throw new IllegalStateException(e);
        }
        checkEndOfBuffer();
    }

Is there a way to basically shortcut this routine (e.g. also avoid

Class<?> clazz = obj.getClass();
final BytesMarshallerFactory bytesMarshallerFactory = bytesMarshallerFactory();
BytesMarshaller em = bytesMarshallerFactory.acquireMarshaller(clazz, false);

as this lookup can be like 10% percent in case of smallish objects. If we can agree on some 'pluggable' interface I can do the work and contribute. Or is there another way to customize serialization ?

regards, Rüdiger

RobAustin commented 10 years ago

Are you using this method in Lang directly or via an HFT collections class ? such as SharedHashMap ?

On 27 May 2014, at 16:38, RuedigerMoeller notifications@github.com wrote:

Hi Peter,

As we have hundreds of datastructures, we go down the serialization route. I need a way to efficiently plug in a custom serializer.

checking AbstractBytes

public void writeObject(@Nullable Object obj) { if (obj == null) { writeByte(NULL); return; }

    Class<?> clazz = obj.getClass();
    final BytesMarshallerFactory bytesMarshallerFactory = bytesMarshallerFactory();
    BytesMarshaller em = bytesMarshallerFactory.acquireMarshaller(clazz, false);
    if (em == NoMarshaller.INSTANCE && autoGenerateMarshaller(obj))
        em = bytesMarshallerFactory.acquireMarshaller(clazz, true);

    if (em != NoMarshaller.INSTANCE) {
        if (em instanceof CompactBytesMarshaller) {
            writeByte(((CompactBytesMarshaller) em).code());
            em.write(this, obj);
            return;
        }
        writeByte(ENUMED);
        writeEnum(clazz);
        em.write(this, obj);
        return;
    }
    writeByte(SERIALIZED);
    // TODO this is the lame implementation, but it works.
    try {
        ObjectOutputStream oos = new ObjectOutputStream(this.outputStream());
        oos.writeObject(obj);
    } catch (IOException e) {
        throw new IllegalStateException(e);
    }
    checkEndOfBuffer();
}

Is there a way to basically shortcut this routine (e.g. also avoid

Class<?> clazz = obj.getClass(); final BytesMarshallerFactory bytesMarshallerFactory = bytesMarshallerFactory(); BytesMarshaller em = bytesMarshallerFactory.acquireMarshaller(clazz, false); as this lookup can be like 10% percent in case of smallish objects. If we can agree on some 'pluggable' interface I can do the work and contribute. Or is there another way to customize serialization ?

regards, Rüdiger

— Reply to this email directly or view it on GitHub.

RuedigerMoeller commented 10 years ago

I want to use SharedHashMap. As fast-serialization does some trickery to avoid hash lookups and potenitally blur locality like object.getClass() and instanceof, serialization performance of small objects might be affected if serialization is called as last resort, so I need a hook which kicks in earlier. This can be quite notable when putting smallish objects using serialization.

leventov commented 10 years ago

@RuedigerMoeller There is BytesMarshallable and there is a fast path for writing it both in VSHM and AbstractBytes.writeInstance() for writing it. If I understand your idea right.

RuedigerMoeller commented 10 years ago

I disagree. BytesMarshallable requires a lot of changes to existing code. Think of a system with hundreds of datastructures. Nobody is willing to pay the price for custom/hand written serialization. So I need to use object serialization. I have a very well performing implementation of generic objectserialization which I want to plug in. BytesMarshallable does not cut it. And even if I patch out the ObjectSerialization, the path still is:

if (BytesMarshallable.class.isAssignableFrom(objClass)) {
                ((BytesMarshallable) obj).writeMarshallable(this);
            } else if (Externalizable.class.isAssignableFrom(objClass)) {
                ((Externalizable) obj).writeExternal(this);
            } else if (CharSequence.class.isAssignableFrom(objClass)) {
                writeUTFΔ((CharSequence) obj);
            } else {
                writeObject(obj);
            }

I mean the instanceof chain adds serious overhead when serializing small objects (which FST does in the area of some 100 nanos if used/tuned right). Additionally you grep String objects away from serialization .. I would need basically a plug to completely replace the decision tree for en/decoding, anyway I can fork or write a wrapper. Just some input from someone evaluating this ..

leventov commented 10 years ago

@RuedigerMoeller you mean adding methods like customKeySerialization(BiConsumer<Bytes, K> serializer) and for value accordingly to SharedHashMapBuilder API would be useful?

RobAustin commented 10 years ago

@RuedigerMoeller

you may find this interface useful

net.openhft.collections.ReplicatedSharedHashMap.EntryExternalizable

its implemented by :

net.openhft.collections.VanillaSharedReplicatedHashMap
peter-lawrey commented 10 years ago

To make it completely pluggable you can avoid using writeObject() all together. You can use instead the OutputStream/InputStream, or write you own serializer/deserializer which writes/reads the data how you wish. writeObject is provided as a convenience, but if it doesn't do what you need, don't call it.

On 27 May 2014 21:02, Roman Leventov notifications@github.com wrote:

@RuedigerMoeller https://github.com/RuedigerMoeller you mean adding methods like customKeySerialization(BiConsumer<Bytes, K> serializer) and for value accordingly to SharedHashMapBuilder API would be useful?

— Reply to this email directly or view it on GitHubhttps://github.com/OpenHFT/HugeCollections/issues/24#issuecomment-44327551 .

leventov commented 10 years ago

@BoundedBuffer ReplicatedSharedHashMap.EntryExternalizable is between memory and wire, @RuedigerMoeller is talking about serializations between native Java and memory.

leventov commented 10 years ago

@peter-lawrey the problem is that VSHM do call writeObject() inside.

peter-lawrey commented 10 years ago

I was thinking of Chronicle, where is entirely a choice. ;)

If you want to avoid looking up a marshaller for each class, what is the alternative you want to use? Can you use a mutable wrapper?

Map<String, MyBytesMarshallableRef> map = shared map. MyBytesMarshallableRef ref = new MyBytesMarshallableRef();

ref.value = myRandomType.

map.put(key, ref);

if (map.getUsing(key, ref) != null) { // ref is set and found

}

On 27 May 2014 21:29, Roman Leventov notifications@github.com wrote:

@peter-lawrey https://github.com/peter-lawrey the problem is that VSHM do call writeObject() inside.

— Reply to this email directly or view it on GitHubhttps://github.com/OpenHFT/HugeCollections/issues/24#issuecomment-44330969 .

RobAustin commented 10 years ago

Yes - agreed, VSHM does call writeObjetc() but EntryExternalizable does not call writeObject().

Sent from my iPad

On 27 May 2014, at 21:29, Roman Leventov notifications@github.com wrote:

@peter-lawrey the problem is that VSHM do call writeObject() inside.

— Reply to this email directly or view it on GitHub.

RuedigerMoeller commented 10 years ago

Correct me if I oversee something. My intention is to completely replace Bytes<=>Object transformation (e.g. avoid per-class marshaller lookup and instanceof-chain).

@Peter

If you want to avoid looking up a marshaller for each class, what is the alternative you want to use? Can you use a mutable wrapper?

Just provide a delegation mechanism. The lookup can be avoided as frequently all values have same type, so a custom serializer could cache a marshaller. Seems ridiculous, but hash lookups always add to cache pollution. As encoding is the main performance bottleneck for offheap storage (If one has to deal with random serializable classes), a lot of trickery can be done to speed up (e.g. pre-known objects which are encoded by e.g. a short, partial/lazy decoding etc.).

@BoundedBuffer - I am not too deep into the HFT classes, so I am not aware of the role of entryexternalizable. have to figure out ;)

@leventov

you mean adding methods like customKeySerialization(BiConsumer<Bytes, K> serializer) and for >value accordingly to SharedHashMapBuilder API would be useful?

Yep, something along the lines of this. Does this exist ?

I thought about it the night and maybe I am better off completely wrapping the map and just put byte arrays or Bytes from the wrapper (unfortunately each library has its own flavour of Bytez abstraction ..). On the other hand your shared map could get a significant speed boost if custom serialization is pluggable.

BTW thanks for quick feedback :-)

leventov commented 10 years ago

@RuedigerMoeller

Does this exist ?

Not yet. We will consider adding such thing, thanks for the idea.

RobAustin commented 10 years ago

@RuedigerMoeller What are your time scales for this, we've added a task on our internal JIRA system to add this functionality.

HCOLL-91 SHM key/value serializer abstraction (for configuration and speed)
RuedigerMoeller commented 10 years ago

Awesome ! If it comes within say 3 month its ok for me.

RobAustin commented 10 years ago

OK - We'll aim for that.