Closed thiyanesh closed 3 years ago
Thanks @thiyanesh for this PR
I recommend breaking it into several PRs It can be easily be broken into 3 PRs
The main issue here is the race condition, that is the result on non-atomic (non-transactional) increment and decrement operations on allocated
which captures the next location to allocate in a buffer
If indeed the allocate method is called not in exclusive mode there is the potential for the race you describe
But I am not sure a (costly) synchronize is necessary, or other complicated solutions
The only reason this race is created is due to the non-monotonic behavior of allocated
. But it is only decreased after exceeding the threshold.
In a normal workload we can expect this to happen only once when the block is almost full.
So a possible solution would be to have two monotonic counters. The allocate method would be as follows
allocated
. It is guaranteed allocated
does not exceed the threshold. Thank you for the detailed feedback. I will update the PR with Block.allocate
change alone. also, will try to solve in using only one existing atomic variable.
@Eshcar , If this PR is fine, please squash merge to avoid unnecessary commit history with other changes. Or do you recommend to close this PR and create a new PR with this change?
Hi @thiyanesh !
I am so happy you opened this PR and found that real problem! Thank you! I am Anastasia and I am the lead of Oak project, nice to e-meet you!
The scenario you present is real and thus should be definitely fixed. You have a great eye on concurrency problems and it would be pleasure to continue collaborating!
I less like any locks or synchronized being added. Also, I less like the CompareAndSwap loop, this is because Fetch-And-Add still has better performance. Previously AtomicInteger.getAndAdd()
(redirected to unsafe.getAndAddInt()
) was implemented with Fetch-and-Add and now I indeed see it turns to CompareAndSwap-loop in Java8. However, the implementations can be different and I wouldn't straightforward use compareAndSet()
.
What I would suggest as a solution is first of all to remove the decreasing of allocated
once the request is above capacity and to throw the exception. Looking on Block.allocate()
usage (in NativeMemoryAllocator) we can see that the exception is just caught and if the request is not problematic (i.e. above the block size in general), allocation request is redirected to next block.
In order not to increase allocated
too much, I would add a check
if (allocated.get() + size > this.capacity)
just before the allocated.getAndAdd(size)
to decrease the probability of unnecessary increase.
What do you think @thiyanesh ? I would very much like to hear your thoughts! If you really want to progress with the CompareAndSet loop this is also a possibility. Also, leave this PR for dealing with this problem only. Please go ahead and open other PRs with all other improvements you have suggested before. Looking forward!
Thanks, Anastasia
@sanastas, Thank you for the detailed feedback.
What I would suggest as a solution is first of all to remove the decreasing of allocated once the request is above capacity and to throw the exception. Looking on Block.allocate() usage (in NativeMemoryAllocator) we can see that the exception is just caught and if the request is not problematic (i.e. above the block size in general), allocation request is redirected to next block.
True, this can be done if the following scenario is acceptable
bytebuffer
) might be left unused unless the whole block is released later.If there will be more contentions and its ok to tradeoff space to achieve better time, then can use the approach you have suggested. (throw exception on the first overflow using 2 counters to simulate a completion and throw exceptions) If there will be less such contentions, compareAndSet might be fine as there will be minimal loops.
In order not to increase allocated too much, I would add a check if (allocated.get() + size > this.capacity) just before the allocated.getAndAdd(size) to decrease the probability of unnecessary increase.
As per my understanding, this would not guarantee the operation to be atomic.
Further, since the allocated
is maintained as AtomicInteger
and not as AtomicLong
, there can different side effects of overflow(negative) which can result in inconsistencies in long allocated() { return allocated.get(); }
response.
Please do share your comment about this(or if i misunderstood your comment).
@thiyanesh , good to hear from you again!
True, this can be done if the following scenario is acceptable
T1 requests bigger size T2 and T2 requests valid size (combined size will also be valid) Depending on the order of context switches in concurrently executing T1, T2, T3, the complete block offset (corresponding bytebuffer) might be left unused unless the whole block is released later.
Do you mean T1 requests a size (single slice) bigger than a block? According to what I think, T1 will fail the pre-check (wether T@ and T3 proceed or not) and than fail into exception on the NativeMemoryAllocator level. While, T2 and T3 can continue to allocate from the same block. So no unused block in this scenario.
If you meant that T1's size is bigger when summed with T2 and/or T3 sizes. Than yes, there might be big portion of block unused. I believe this should be a very rare scenario, as our blocks are quite big, and the concurrency should be very specific. However, truly, we planned to create a slice from this unused space and add it to the free list (of NativeMemoryAllocator) so it can be used by others. But I didn't want to request a complicated solution from you, unless you want to write it. The issue of transferring such an unused space to free list can be fixed later in other PR. Or here if you like.
If there will be more contentions and its ok to tradeoff space to achieve better time, then can use the approach you have suggested.
I do not understand this comment fully. I think the solution, I explain above, is good both in space and in time.
(throw exception on the first overflow using 2 counters to simulate a completion and throw exceptions)
I didn't suggest using two counters, this is a possibility of course. I do not see a big reason to use two Fetch-and-Add (expensive) instructions instead of one. Why do you prefer 2 counters? Probably I am missing something.
If there will be less such contentions, compareAndSet might be fine as there will be minimal loops.
You are right if there is little to no contention compareAndSet loop is almost the same as Fetch-and-Add. However, Fetch-and-Add is almost the same as compareAndSet while no contention, and with contention Fetch-and-Add is faster. So, doesn't it sound like Fetch-and-Add is preferable?
As per my understanding, this would not guarantee the operation to be atomic.
You are right. The check I suggested is for the cases with no concurrency. Just to decrease the possibility of unnecessary increase in the allocated
above the capacity. The check is not atomic.
Further, since the
allocated
is maintained asAtomicInteger
and not asAtomicLong
, there can different side effects of overflow(negative) which can result in inconsistencies inlong allocated() { return allocated.get(); }
response.
Very good point! As for now, block capacity can not be above 2GB (ByteBuffer limitation, inherited from the past). Thus we can hardly get to this problem. But you are right if we allocate 2GB block and then go a little above capacity. If you would like to fix it, go ahead and define allocated
to be AtomicLong
.
Looking forward to hear from you! Please let me know what do you think about my comment. Thanks!
Do you mean T1 requests a size (single slice) bigger than a block?
As you have mentioned, this will result in failure of precondition. No. i didn't intent this.
If you meant that T1's size is bigger when summed with T2 and/or T3 sizes.
Yes, i meant this. But even ignoring concurrently executing threads, this will be an issue even for a sequential execution. So, as you have mentioned, this has to be handled independently for all scenarios.
I do not see a big reason to use two Fetch-and-Add (expensive) instructions instead of one. Why do you prefer 2 counters?
The primary reason suggestion of 2 counter fetch-and-add was:
long allocated() { return allocated.get(); }
- It might not be good to return an invalid value(either negative/large value if overflow or outside the range of integer for 2GiB even if it's long).
I am not sure about being able to do this in single fetch-and-add.
With 2 counters as mentioned by @Eshcar, the flow will be
allocated
counter and it stays in a valid state.allocated
)As for now, block capacity can not be above 2GB (ByteBuffer limitation, inherited from the past). Thus we can hardly get to this problem. But you are right if we allocate 2GB block and then go a little above capacity.
Yes, as you have mentioned the following can simulate the behavior: With a 2GiB (Integer.MAX_VALUE) limit, using a fetch-and-add can result in negative(as integer) or an invalid bigger value(long).
long allocated() { return allocated.get(); }
I not sure whether the above descriptions convey the intent. The actual intent was
long allocated
, stays inline with a valid state.Personally, to satisfy all these 3 conditions in a concurrent execution environment, we might need synchronized or CAS loop or 2 counters with fetch-and-add.
Please do share your feedback and decide whether this PR is relevant or can be solved in different approach.
Hi @thiyanesh ,
Thank you for clarifying! I think I understand now your reason to use two counters and your concern about Block's method long allocated() { return allocated.get(); }
In what I have previously suggested there is always a possibility of allocated
counter of Block being above the capacity. So you see it as wrong output, which is very reasonable.
Please pay attention that Block's allocated()
method is used only once in NativeMemoryAllocator
class, just to check whether the block is allocated above its capacity. It is not used for true accounting of Oak's allocated memory. Inside, NativeMemoryAllocator
class we have another AtomicLong allocated
counter which is indeed used for true accounting of Oak's allocated memory.
Therefore, the fact that Block's allocated()
method returns a value above capacity, actually makes no harm. However, your concern is absolutely legitimate, as the code looks like not working right. What do you think about just changing the name of the Block's allocated()
method (something about being upper limit) and/or adding a good comments around it?
(in the existing master branch code with subtraction), avoid overlapping buffer state.
I am not sure I understand the requirement above.
So still I would prefer no synchronize, no CAS loop and no double F&A, because I do not work the good path to work slower in order to resolve the corner cases that may rarely happen.
In my eyes the following solution is optimal:
allocated()
method and/or commentallocated
counter AtomicLongallocated
counter when going above capacityif (allocated.get() + size > this.capacity)
although it is not atomic.NativeMemoryAllocator
Or any other solution that doesn't impact the normal/good path execution that you want to suggest. What do you think? Does it make sense to you?
Thank you for taking care of Oak!!!
@thiyanesh , anything new? Would be great to hear from you! I wish your ideas will land to Oak main code. Looking forward!
@sanastas , please find the updated PR as per the discussions.
@thiyanesh ,
Thank you! All looks good. I run mvn clean package
and it passes well. Will merge your PR.
Fail eagerly on allocated size comparison to prevent overlap and fragmentation due to race condition.
I confirm that this contribution is made under the terms of the license found in the root directory of this repository's source tree and that I have the authority necessary to make this contribution on behalf of its copyright owner.