simerplaha / SwayDB

Persistent and in-memory key-value storage engine for JVM that scales on a single machine.
https://swaydb.simer.au
Apache License 2.0
293 stars 16 forks source link

Java: Serializers should work with byte[] (not Byte[]) #308

Open maxim5 opened 3 years ago

maxim5 commented 3 years ago

I have own serialization mechanism and was trying to write an adapter to a Serializer, like this:

new Serializer<>() {
    @Override
    public Slice<Byte> write(T data) {
        byte[] bytes = ...
        return Slice.ofJava(ByteBuffer.wrap(bytes));
    }

    @Override
    public T read(Slice<Byte> slice) {
        return readFrom(slice.toByteArrayInputStream(), slice.size());
    }
}

Both write and read fail with ClassCastException, trying to convert a byte to a Byte or back. I ended up writing these ugly transformations:

private static Byte[] box(byte[] bytes) {
    int length = bytes.length;
    Byte[] boxed = new Byte[length];
    for (int i = 0; i < length; i++) {
        boxed[i] = bytes[i];
    }
    return boxed;
}

private static byte[] unbox(Byte[] boxed) {
    int length = boxed.length;
    byte[] bytes = new byte[length];
    for (int i = 0; i < length; i++) {
        bytes[i] = boxed[i];
    }
    return bytes;
}

and that made the Serializer to work. But this is totally inefficient. Why can't the serializer work with native bytes?

simerplaha commented 3 years ago

You are right and this is a known problem. The need for boxing should be removed. Largest majority of heap allocated objects are of type Byte, removing these would get us a massive performance advantage.

I have a solution in mind to fix this. Just gotta find time to implement it.

maxim5 commented 3 years ago

Thanks!

If the upcoming API isn't set in stone yet, I'd suggest make it generic via InputStream and OutputStream. This way the client will be able stream the object in or out by any method, and the Slice itself could do the necessary conversions. In the future, the implementation could be optimized to avoid intermediate arrays and buffers at all, e.g. streaming directly from file or in-memory buffer.

The interface could look something like

public InputStream toInputStream();

public void fromOutputStream(Consumer<OutputStream> consumer);
simerplaha commented 3 years ago

I'd suggest make it generic via InputStream and OutputStream.

By "it" do you mean the Serializer API or the data types (Map, Set etc) API? I'm guessing you mean data types.

This way the client will be able stream the object in or out by any method

If you are looking for a way to stream data in and out of your data-types then it already exists, see Stream. I'm sure we can write some quick convenient functions to create a Stream to and from java's OutputStream and InputStream.

Here is an example:

Set<Integer, Void> set =
  MemorySet
    .functionsOff(intSerializer())
    .get();

//write data as a stream
Stream
  .of(Arrays.asList(1, 2, 3))
  .forEach(set::add);

//TODO: To implement your fromOutputStream suggestion we can implement something like this
Stream
  .fromOutputStream(someOutputStream)
  .forEach(set::add);

//read data as a stream
set.forEach(System.out::println);

//convert the stream to generic iterator
set.stream().iterator();

//TODO: To implement your toOutputStream suggestion we can implement this
set.stream().toOutputStream();