riteshtijoriwala / protostuff

Automatically exported from code.google.com/p/protostuff
Apache License 2.0
0 stars 0 forks source link

GraphProtostuffOutput misses some opportunities e.g. when serializing strings #116

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
GraphProtostuffOutput seems to find shared objects and use references, only if 
writeObject method is used.

But in many cases, some other classes, e.g. RuntimeUnsafeFieldFactory invoke 
such methods of GraphProtostuffOutput like writeString, writeBytes, 
writeByteArray. In those case, no sharing is performed.

As an example, you can use this class:

public class DummyPojo
    implements Serializable
{
    private static final long serialVersionUID = 1L;

    public DummyPojo()
    {

    }

    public DummyPojo( String name, int size )
    {
        this.name = name;
        this.size = size;
        payLoad1 = new String( new byte[size] );
        payLoad2 = payLoad1;
    }

    public String name;

    public int size;

    public String payLoad1;
    public String payLoad2;
}

Since payload1 and payload2 fields refer to the same String object, it should 
be possible to share it using GraphProtostuffOutput.

Suggestion:
Improve GraphProtostuffOutput to support more opportunities for sharing. In 
principle, everything could be shared.

If doing it for all objects is too much, then probably everything that has size 
bigger than the size of a reference could/should be shared.

And of course, if you think that some classes are very unlikely to be shared in 
practice, you can avoid checking if objects of such classes can be shared.

Original issue reported on code.google.com by romixlev on 27 Apr 2012 at 1:39

GoogleCodeExporter commented 8 years ago
So you want to share scalar values?  For the common usecase, there isn't much 
gain from doing that.
It is possible for another graph output, but its too much overhead to include 
in the main GraphProtostuffOutput.

What you're actually looking for is an output that does deduplication (which 
belongs to a modules of its own)

Original comment by david.yu...@gmail.com on 27 Apr 2012 at 2:24

GoogleCodeExporter commented 8 years ago
> So you want to share scalar values?  For the common usecase, there isn't much 
gain 
> from doing that.

> It is possible for another graph output, but its too much overhead to include 
in 
> the main GraphProtostuffOutput.

Regarding the overhead:

I posted you a link to my small library called quickser on github. It does full 
sharing, i.e. for everything, except numbers. And it is currently way faster 
than protostuff according to my measurements. What I mean is: sharing does not 
mean a lot of overhead necessarily. Most graphs in typical use-cases are not 
huge in the number of objects. In such situations, a very simple and stupid 
identity based linear scan of an array of seen elements works MUCH faster than 
any hash map, which is used in protostuff or Kryo. Do you want to give this 
approach a try?

> What you're actually looking for is an output that does deduplication (which 
> belongs to a modules of its own)

Well, I'm just trying to reproduce the same structure of the object graph as it 
had before serialization. I'm not sure that it means deduplication, because 
IMHO deduplication is something optional per definition.

Original comment by romixlev on 27 Apr 2012 at 3:08

GoogleCodeExporter commented 8 years ago
> So you want to share scalar values?  For the common usecase, there isn't much 
gain 
> from doing that.

One more comment regarding use-cases. Think of many maps that use similar/same 
sets of keys and or values (e.g. strings or bytearrays). For example session 
attributes maps in HTTP or SIP containers or something like that. In many 
cases, such keys may be not only the same according to equals() method, but are 
really the same object (e.g. final static Strings defined in some classes 
interfaces as predefined keys), because sessions usually have a set of standard 
attributes set by the container and then you can define your own as well. Now 
assume that you try to implement session replication and/or persistence and 
want to use protostuff for it. Having proper sharing for them would save a lot 
of space eventually.

Original comment by romixlev on 27 Apr 2012 at 3:18