yahoo / Oak

A Scalable Concurrent Key-Value Map for Big Data Analytics
Apache License 2.0
268 stars 48 forks source link

Combine version into reference and encapsulate majority of version referencing #117

Closed sanastas closed 4 years ago

sanastas commented 4 years ago

The changes are a step toward total encapsulation of the memory manager.

I confirm that this contribution is made under the terms of the license found in the root directory of this repository's source tree and that I have the authority necessary to make this contribution on behalf of its copyright owner.

sanastas commented 4 years ago

Regarding the following notes:

(1)---------------------------------------------------------------- The main design issues I have is the lack of distinct dependency tree between the following entities:

MemoryManager (and its implementations)
Slice
ReferenceCodec

We should find a way to untangle this spaghetti to avoid these co-dependencies. I suggest the following tree: MemoryManager -> (depends on) ReferenceCodec, Slice ReferenceCodec -> (depends on) Slice, "unaware" of any MemoryManager Slice -> "unaware" of the others

Reference Codec (RC) is part of Memory Manager (MM) implementation and anyone outside of the MM is not aware of RC. However, there is RC per MM and it is related to MM logic (this is the reason for different RCs). Also RC is accessing the MM's memory allocator in order to avoid parameter transfers.

Slice is a simple container for all information related to a piece of buffer it is associated with. It may include anything, that any MM puts inside. I think in the future we may have Slice definition per MM. But currently I am OK with how it is.

I do not see any spaghetti here.

(2)---------------------------------------------------------------- Regarding correctness, I can't say I completely understand the modifications to the flow in the InternalOakMap methods. Do we have some kind of flow graph that can describe the basic map operations? This can greatly improve our ability to reason about this code.

I have a good understanding of the algorithm behind the code. It is indeed bad it is only in my head, but to put it in some detailed design document requires some time. I do think it is a good effort that can help us in the future. We should ask Eshcar whether it is a good time investment.