Open pveentjer opened 2 years ago
Hi @pveentjer : nextWakeupNanos
is used by https://github.com/netty/netty-incubator-transport-io_uring/blob/98f7b6d716a2043367a2d6415629b13e88b06a73/transport-classes-io_uring/src/main/java/io/netty/incubator/channel/uring/IOUringEventLoop.java#L361 from outside the event loop (ie from another thread) if inEventLoop == false
I believe this could be changed in order to just use a plain get somewhere, but I believe using eventFd
to wake up the EL is such costy that we rather prefer to pay the full barrier cost.
on https://www.scylladb.com/2018/02/15/memory-barriers-seastar-linux/ I see that there are ways to save a(n additional) full barrier there, so happy to review any PR to improve this
Ok. In that case we need something more powerful than a plain long. But I think we can get rid of the fences. So a get/set opaque should be sufficient since there are no memory ordering effects required.
So the following should be sufficient for the first set:
nextWakeupNanos.lazySet(curDeadlineNanos);
It will provide a [StoreStore][LoadStore] which on the X86 you get for free since all stores are release stores. So it is just a compiler restriction.
setOpaque would be more appropriate, but not available on Java 8.
Need to think about the getAndSet.
With Scylla the mem barrier is removed for task exchange between threads. Memory ordering matters in that case. But afaik for the nextWakeupNanos memory ordering is irrelevant. So I don't see a good reason why a price needs to be paid for memory ordering if it is irrelevant.
@pveentjer PRs welcome
I would be more than happy to make a PR :) But before I do, I want to make sure my understanding is correct.
Assumption: the memory ordering effects of nextWakeupNanos are irrelevant. We just need to make sure that the compiler doesn't optimize out the load or store.
So is this assumption correct?
Hey thanks @pveentjer. The IOUringEventLoop
logic was derived from EpollEventLoop
, and in an earlier iteration of that I did actually used lazySet
here following similar reasoning to yours.
However I changed it to set
after realizing that ordering is in fact important relative to the task queue submission. Otherwise I think a race / missed wake-up could be possible.
But it's possible I was wrong about this, I'm sure your understanding of the memory order modes is better than mine!
@njhill Oooops I didn't notice you pinged me on https://github.com/netty/netty/pull/9561#issuecomment-532926597 :( I will use this chance to help @pveentjer to find why/if it hasn't worked
@franz1981 np I think it was just for info and it was quite a long time ago now!
@pveentjer you also probably noticed we guard the getAndSet
with a volatile read to avoid its cost in most cases. If you think an AtomicInteger
would be significantly cheaper than AtomicLong
we may be able to do find a way to use that instead.
In the IOUringEventLoop the nextWakeupNanos is updated twice:
The nextWakeupNanos is an AtomicLong and the set and getAndSet are not very cheap. On the X86 the trigger a memory fence which effectively causes the CPU to stop issuing loads till the store buffer has been drained. And this will not benefit performance.
Based on the code it seems that the nextWakeupNanos is not used by any other thread. Perhaps I'm missing something since I didn't study the code in-depth, but I think the nextWakeupNanos could be converted to a simple plain long.
THe IOUringSubmissionQueue already has the appropriate fences.