yahoo / Oak

A Scalable Concurrent Key-Value Map for Big Data Analytics
Apache License 2.0
268 stars 48 forks source link

Hi, I find a strange problem about OakMap.remove() #195

Open fujian-zfj opened 2 years ago

fujian-zfj commented 2 years ago

image

I don't quite understand why ctx.key.getSlice() is not released when OakMap.remove() is called, but only ctx.value.getSlice() is released. It is also because ctx.key.getSlice() is not released, the blocks are created one by one, and finally my process hangs because of outofmemory. image

fujian-zfj commented 2 years ago

I have noticed that ctx.key.getSlice() can be reused in the same OakMap, so keysMemoryManager is instantiated as SeqExpandMemoryManager, but in my application scenario, I will not reuse ctx.key.getSlice() because the key of OakMap is incremental and non-repetitive.

    MemoryManager keysMemoryManager = new SeqExpandMemoryManager(memoryAllocator);
sanastas commented 2 years ago

Hi @fujian-zfj ,

As we already discussed in issue #192 keys are not released for now in OakMap. Specifically SeqExpandMemoryManager does not support deletion of the key, unless the key wasn't yet exposed to multiple threads (rare cases). New memory manager for keys is going to be added soon. But your understanding in the Oak internal code is remarkable!

sanastas commented 2 years ago

I don't quite understand why ctx.key.getSlice() is not released when OakMap.remove() is called, but only ctx.value.getSlice() is released.

By the way, keys in OakMap are used for navigation and thus can not be removed when OakMap.remove() is called. Keys can be removed later when the internal structure (chunk) is rebalanced. This is part of our new new code to be added soon.

fujian-zfj commented 2 years ago

I don't quite understand why ctx.key.getSlice() is not released when OakMap.remove() is called, but only ctx.value.getSlice() is released.

By the way, keys in OakMap are used for navigation and thus can not be removed when OakMap.remove() is called. Keys can be removed later when the internal structure (chunk) is rebalanced. This is part of our new new code to be added soon.

Hi, I hava been paying attention to the dynamics of Oak recently, but the NovaMemoryManager for key memory release has not been lanuched. Is there any other solution to release the memeory occupied by these deleted keys. For examqple, as you said, keys can be removed later when the internal structure (chunk) is rebalanced

sanastas commented 2 years ago

Hi @fujian-zfj ,

Indeed we are still working on the issue of releasing the memory, because we want to make it correct and without performance implications. If you want you can take a look on PR#188 and see in the new code how keys are released in the EntryOrderedSet.java file, while releaseAllDeletedKeys() method is triggered from release() method of the OrderedChunk.java. This indeed happens when ordered chunk is rebalanced and disconnected from the main structure.

However, this is not a complete solution, and with small probability, there is a chance a thread may be delayed and wake up on the released chunk, then try to access the deleted key. As memory manager is not NovaMemoryManager this accesses will be allowed and thread may read corrupted/deleted/wrong key.

Alternatively, you may change keys memory manager to be SyncRecycleMemoryManager instead of SeqExpandMemoryManager in the OakMapBuilder.java. But this will slow your performance. So this is all I can suggest for now. The best is to wait for NovaMemoryManager to be arranged, then as I said it is going to be 100% correct and with same performance. We are working on it and this is on our priority list.

fujian-zfj commented 2 years ago

OK. Thanks

dynaxis commented 2 years ago

I have a very similar use case. I add new entries to an OakMap, and read and remove head portion of it when some conditions meet. Will I encounter OOM in the end too? I can't switch to OakHash either. I'm also experiencing a weird bug (headMap contains keys beyond the limit set on creation of the headMap), but it's another problem to file an issue for separately.