google / mug

A small Java 8 library (string manipulation, stream utils)
Apache License 2.0
367 stars 65 forks source link

RocksDB with BiStream #6

Closed Neiko2002 closed 6 years ago

Neiko2002 commented 6 years ago

Hello,

I'm trying to create a BiStream<byte[], byte[]> from a RocksIterator (RocksDB).

It can be done with two RocksIterators and the BiStream.zip(...) method. But this feels insecure because you have 2 iterator states, which can vary.

Another way is to create a Iterator<Map.Entry<byte[], byte[]>> and wrap it in a Stream<Map.Entry<byte[], byte[]>>. Inside the BiStream class is a constructor and a method (BiStream.form()) for this kind of stream, but both of them are not public.

Leaving us with BiStream.biStream(entryStream).mapKeys(Map.Entry::getKey).mapValues(Map.Entry::getValue) (a short version of this would also be nice). This creates additional intermediate Map.Entry objects, which are unnecessary in this case.

Are there other ways to create a new BiStream?

fluentfuture commented 6 years ago

This may save one intermediary Map.Entry:

biStream(entryStream).map((k, e) -> e.getKey(), (k, e) -> e.getValue());

But there is still a temporary Entry object in the middle.

Although, if performance is the critical, maybe you don't have to first create an Iterator<Map.Entry<byte[], byte[]>>? Since RocksIterator already provides access to the current key and value, you might get away with implementing an Iterator<RocksIterator> that upon iterator.next() is called, simply delegates to the same RocksIterator's next() like this:

class IteratorAdapter implements Iterator<RocksIterator> {
  private final RocksIterator rocks;

  @Override public RocksIterator next() {
    rocks.next();
    return rocks;
  }
}

Of course calling next() will invalidate the previous entry, but it seems fine for the purpose of BiStream.

Then you could do:

biStream(rocksStream).map((k, r) -> r.key(), (k, r) -> r.value());
fluentfuture commented 6 years ago

Update:

In v1.11, there is no more intermediary Map.Entry object allocations as long as the input stream is sequential.