Closed GoogleCodeExporter closed 9 years ago
While they have the option to perform random access, most (but not all)
serializers
access the buffer sequentially and could indeed be written using
InputStream/OutputStream. However, the first time random access is needed, a
stream-based approach degrades to the current situation where a (possibly large)
buffer is required.
For example, the Compressor class wraps a Serializer to perform compression and
decompression. It must write the length of the compressed data, then the
compressed
data. The length is needed first because the compressed object may not be at
the root
of the object graph, and upon decompression we need to know how many bytes to
process. For this to be stream-based, the Compressor class would have to do its
own
buffering of the output data so it could write the length to the OutputStream
before
the data.
As you mentioned, an interface could be passed around instead of ByteBuffer to
hide
whether ByteBuffers or streams are used. The question is, would this API change
be
worth it? The current approach is ideal if you need a byte array or ByteBuffer
in the
end anyway, or if you need to write the length first (eg, when writing to a
stream-based protocol such as TCP). The current approach is less ideal when you
have
an arbitrarily large object graph, because you may not know the serialized size
beforehand.
You may find the ObjectBuffer class useful. It provides methods to serialize and
deserialize using byte arrays and streams. It handles the necessary buffering.
It can
be given an initial and maximum size. If an operation fails because the buffer
is too
small, its size will be doubled (up to the maximum) and the operation retried.
In a multithreaded environment, if an ObjectBuffer per thread is too much memory
(thread count * max size), it may be acceptable to use a thread safe pool of
ObjectBuffers. The pool size would be less than the number of threads. When
necessary, threads would block until an ObjectBuffer becomes available.
To serialize an object graph, the graph is normally going to fit in memory. The
buffer size needed to serialize the object graph is normally going to be less
than
the memory used by the Java object representation. These days memory is
abundant. I'm
not sure how much trouble it is worth to try to avoid buffering the serialized
bytes.
Is it possible that this bug is more about the unfriendliness of the API when
the
buffer is too small? Maybe ObjectBuffer should make its ByteBuffer available,
so the
same buffer growing functionality would be available to users who need a
ByteBuffer
rather than a byte array or stream.
Original comment by nathan.s...@gmail.com
on 21 Mar 2010 at 12:43
i'm not the original poster, but the idea of implementing an
auto-growing-buffer would
make the API dramatically easier to use. personally, i can't see the move to a
stream
helping in any significant way
Original comment by lytles...@gmail.com
on 21 Mar 2010 at 3:02
i was thinking that handling the buffer.grow in the buffer itself might be a
good
idea. but did a little bit of reading (and wrote a quick ByteArray delegation).
sounds like the conventional wisdom is that it's a problem (see the note in
yellow at
the top of the mina link)
http://mina.apache.org/iobuffer.html
"The main reason why MINA has its own wrapper on top of nio ByteBuffer is to
have
extensible buffers. This was a very bad decision"
http://stackoverflow.com/questions/1774651/growing-bytebuffer
i think that their primary complaint is the copying of the backing array on
grow.
doesn't seem like too big a deal. but then, i guess i think that i'm mostly
going to
be sending relatively small hunks
the recommendation is to use a direct ByteBuffer (you're already doing this in
your
introduction) and just set a large size - the OS won't actually allocate memory
until
it's needed ...
Original comment by lytles...@gmail.com
on 21 Mar 2010 at 3:49
When you allocate a direct ByteBuffer, the contiguous block of memory is
claimed at
that time.
The OP has a multithreaded environment where he is serializing potentially large
object graphs, so he will need one buffer per thread which can use up a lot of
memory.
ObjectBuffer will grow as needed (if inefficiently). This at least means you
don't
have to allocate a huge amount per thread, just in case. However, if each thread
really does need a large buffer, currently either you need a lot of memory on
the
machine or you'll have to limit the number of large ObjectBuffers you create.
Everything is simplest when a ByteBuffer is used everywhere. Ideally we can work
around any issues this causes.
Original comment by nathan.s...@gmail.com
on 21 Mar 2010 at 4:16
Hi all!
First of all, thanks for the very quick and complete reply. Second, I'm sorry I
posted this as a defect, as this is more of a comment/enhancement.
The reason why I wanted to have InputStream/OutputStream was that I want to use
Kryo
with Spring remoting via HTTP
(http://static.springsource.org/spring/docs/3.0.x/reference/html/remoting.html#r
emoting-httpinvoker),
and as I have no idea what the size of the serialized objects will be, I need to
potentially allocate a "huge" buffer.
I do realize that as the Spring implementation also required to send correct
Content-Length headers, I expect it to be buffered internally on write anyway,
but
having two buffers seems non-ideal, so if I could write direcly to the
OutputStream,
then the only buffer (probably a ByteArrayOutputStream) is in the Spring
implementation.
I will check out the ObjectBuffer and see how that works out, as that would at
least
not allocate much more than I need. I guess I could make mye code a bit
"adaptive" as
well, by adding an "ExpandObjectBufferListener" that would tell me that a
certain
object type required more memory than expected, so an expansion was done.
Thanks again for your time!
Best regards
Morten
Original comment by kezzlera...@gmail.com
on 21 Mar 2010 at 5:01
Spring makes my eyes hurt. I think it is all the pointy brackets. ;)
Not much I can say about the two buffers. Kryo doesn't really have a solution to
avoid this. If it were really needed, I think we would use a list of
ByteBuffers,
similar to PyroNet:
http://code.google.com/p/pyronet/source/browse/trunk/%20pyronet/jawnae/pyronet/u
til/ByteStream.java
This way we could have a hook to allow you to handle each ByteBuffer as it were
filled. For now though, I'd like to try to get by as simple as possible, using
just
ByteBuffer.
FWIW, ObjectBuffer logs a debug message when it is resized. Debug logging can be
enabled with "Log.set(Log.LEVEL_DEBUG);" (assuming you aren't using a hardcoded
logging level, see the minlog project for more info). On a related note, what
Kryo is
doing can be observed on a very low level with the TRACE logging level.
Original comment by nathan.s...@gmail.com
on 25 Mar 2010 at 1:24
Original comment by nathan.s...@gmail.com
on 25 Mar 2010 at 1:24
a proof of concept implementation of a serializer (and a Kryo subclass to
simplify
registration) that wraps the normal serializers, and flushes the ByteBuffer as
needed. i'm not wrapping the primitive serializers - so i catch overflows
(going to
try intercepting the primitives next)
the flushing method that i use is naive - it just stores the data in a linked
list of
byte arrays. but could obviously write to an output stream (for spring) or
directly
to the network
i don't do any processing on the read side - just a pure delegate
inspired by nate's comments from the mailing list (8th comment in this thread):
http://groups.google.com/group/kryo-users/browse_thread/thread/f936d2b459638211
> Another solution is to periodically dump the contents of the buffer. This
...
> Interestingly, this could be built without needing changes to Kryo. I am
> curious how well this would work. Anyone feel like implementing it? :)
Original comment by lytles...@gmail.com
on 4 Jun 2010 at 9:20
Attachments:
another flushing meta serializer implementation - wraps primitives, and never
allocates a larger buffer. doesn't work for (large) strings, but could -
probably
best to just provide a FlushStringSerializer
using this slows things down by 10-20% for my simple test case - every
primitive
results in an additional call. but the penalty doesn't depend on the size of
the
output - there's no reallocation of buffers, no try catch. the try-catch based
flusher above has almost no penalty if the buffer is large enough, but if the
initial
buffer is a lot smaller than the size of the largest component, eg an array,
the
penalty goes up dramatically (i'm assuming it's O(n^s) but i haven't really
checked)
Original comment by lytles...@gmail.com
on 5 Jun 2010 at 9:20
Attachments:
Original issue reported on code.google.com by
kezzlera...@gmail.com
on 20 Mar 2010 at 1:29