yahoo / Oak

A Scalable Concurrent Key-Value Map for Big Data Analytics
Apache License 2.0
268 stars 48 forks source link

Elimination of all ByteBuffer Usage (for Java 8) #132

Open sanastas opened 3 years ago

sanastas commented 3 years ago

Base all of the off-heap memory management on unsafe allocation and access, without ByteBuffer intermediate layer (in addition to existing memory management). The reason for the change is the possibility to gain better throughput without spending CPU cycles and memory on ByteBuffer internals.

That means a new memory manager to be based on using addresses instead of ByteBuffers and allocating the address from unsafe: Unsafe().allocateMemory(size);

That requires a significant code writing: new (alternative) Block, BlockPool, MemoryAllocator, MemoryManager etc.

For more explanations feel free to add questions in this issue.

dynaxis commented 2 years ago

It doesn't seem that the doc comments on the relevant code are changed in sync with the changes. For instance, OakUnsafeDirectBuffer's getByteBuffer() returns a new ByteBuffer on every invocation of the method. But the doc comment still says that state of the returned ByteBuffer shouldn't be modified mentioning it might be shared and used elsewhere.

sanastas commented 2 years ago

Hi @dynaxis ,

Thanks for your comment. Here is the comment/documentation about OakUnsafeDirectBuffer's getByteBuffer()

public interface OakUnsafeDirectBuffer {
    /**
     * Return the underlying memory address wrapped as a ByteBuffer.
     * This buffer might contain data that is unrelated to the context in which this object was introduced.
     * Note 1: depending on the context (casting from OakScopedReadBuffer or OakScopedWriteBuffer), the buffer mode
     *         might be ready only.
     * Note 2: the buffer internal state (e.g., byte order, position, limit and so on) should not be modified as this
     *         object might be shared and used elsewhere.
     *
     * @return the underlying memory address wrapped as a ByteBuffer.
     */
    ByteBuffer getByteBuffer();

Indeed, a new ByteBuffer object is returned on each invocation. However, the underlying memory referenced from this ByteBuffer is always the same (for the same OakUnsafeDirectBuffer object). Therefore, indeed, modifying those two ByteBuffers may lead to corrupted results. This is why the interface is named Oak-Unsafe-DirectBuffer and it needs to be used with extra care.

How would you suggest to rephrase the documentation so it is more clear?

dynaxis commented 2 years ago

Hi @sanastas,

The doc comment is a little bit misleading in the following points, IMHO:

  1. This buffer might contain data that is unrelated to the context in which this object was introduced.

    I think this is potentially misleading in that currently the returned ByteBuffer is configured to prohibit access to outside of the bounds. It should be safe to touch anywhere within the bounds, isn't it?

  2. Note 2: the buffer internal state (e.g., byte order, position, limit and so on) should not be modified as this object might be shared and used elsewhere.

    In the similar way, it's not possible to touch outside of the configured boundaries. Also the ByteBuffer instance is no longer shared among multiple threads. So it should be ok to change byte order, position, limit and so on.

So my suggestion for a better wording is as follows:

public interface OakUnsafeDirectBuffer {
    /**
     * Returns a ByteBuffer wrapping the underlying memory segment backing this buffer.
     * 
     * Since a new ByteBuffer instance is returned on each invocation of this method, it's absolutely safe
     * to modify the state of the ByteBuffer including its byte order, position, and limit.
     * 
     * Note that the buffer mode of the returned ByteBuffer might be read-only, 
     * depending on whether this buffer has been casted from OakScopedReadBuffer or OakScopedWriteBuffer.
     *
     * @return a ByteBuffer wrapping the underlying memory segment
     */
    ByteBuffer getByteBuffer();

UPDATE: I also found that the ByteBuffer is never read-only different from the note 2 in the current doc comment. So should we remove the last sentence starting with Note that the buffer... or make the ByteBuffers returned from read only buffers actually read-only?

In my opinion, ByteBuffers returned should be set read-only if they are from read-only buffer.

sanastas commented 2 years ago

Hi @dynaxis ,

Thank you for your reply. Your suggestion makes sense. Although I would stress that this buffers brings you no protection from possible concurrent update of the same underlying memory from different ByteBuffer objects. If you are working with single thread then no problem, of course.

public interface OakUnsafeDirectBuffer {
    /**
     * Returns a ByteBuffer wrapping the underlying memory segment backing this buffer. 
     * NOT THREAD SAFE! Two ByteBuffers of the same OakUnsafeDirectBuffer will reference to the same memory, 
     * so concurrent access with at least one write may result in inconsistent state of the data. 
     * 
     * Since a new ByteBuffer instance is returned on each invocation of this method, it's absolutely safe
     * to modify the state of the ByteBuffer including its byte order, position, and limit.
     * 
     * Note that the buffer mode of the returned ByteBuffer might be read-only, 
     * depending on whether this buffer has been casted from OakScopedReadBuffer or OakScopedWriteBuffer.
     *
     * @return a ByteBuffer wrapping the underlying memory segment
     */
    ByteBuffer getByteBuffer();

As for read-only buffer, while creating the buffer inside DirectUtils's wrapAddress we need to check for instance of OakScopedWriteBuffer and only then let the ByteBuffer be writable, otherwise asReadOnlyBuffer().

@dynaxis , as the change is quite easy, would you be able to provide a PR, so I will review and merge to master? So you will join our small but powerful contributors community? :)

dynaxis commented 2 years ago

Dear @sanastas,

I agree with your additions to the doc comment, though I might suggest a bit of rewording. Also it's good to issue a PR myself. BTW, I think the check for read only buffer should be inside getByteBuffer() method rather than DirectUtils.wrapAddress(). Otherwise DirectUtils will have unnecessary tight coupling with buffer types.

Anyway, I'll prepare a PR soon and come back.

sanastas commented 2 years ago

Thank you very much @dynaxis , do as you think is the best way! Very much appreciated!

dynaxis commented 2 years ago

@sanastas #209 is issued to resolve the issues I raised above.

sanastas commented 2 years ago

@dynaxis , thank you so much! Review done. Code looks good, just one comment left!