Closed yawkat closed 8 months ago
I had a memory issue and @franz1981 suggested #211 as the cause. This patch is my fix for that bug, though I don't believe my mem issue was ultimately caused by this.
This PR does the legwork for adding ioringOpTimeoutRemove, and implementing a test. However two things can still be improved:
- [ ] could use IORING_TIMEOUT_UPDATE (see Not Fired timers leak into the kernel #211) to save one sqe.
- [ ] there may be a race in IOUringEventLoop between the addTimeout and the IORING_OP_TIMEOUT handler. If the kernel fires a deadline cqe, then we send a deadline update sqe, and only then we process the first cqe, prevDeadlineNanos is NONE even though we've submitted a new deadline. I'm not sure if this can actually happen since deadline changes should only adjust the deadline downwards, not upwards? Not sure.
Thanks a lot ! I think what you did is fine... I would defer the use of IORING_TIMEOUT_UPDATE as it would also most likely bump the kernel version requirement. Maybe we can add an issue to keep track of it
@yawkat so with this said I think we can put this out of draft state ?
Also seems like the new added test failed :D
i will look at the test but it might take a few days.
i think the race condition is still an issue. id like to at least think about it a bit more, consider how it can happen.
ive introduced a "generation counter". basically, every addTimeout counts up the generation and sends it to the kernel as the extra field. When a timeout occurs, i check that the generation matches the latest timeout that was added. This ensures that a previous timeout cannot make us think that the current timeout is finished.
well done @yawkat ! I like the epoch/sequencing approach
@yawkat seems like there is still a test failure
thats weird. it passes locally every time, doesnt appear flakey.
thats weird. it passes locally every time, doesnt appear flakey.
did you run it in a loop ?
@yawkat ping
fixed
@yawkat so I think this is now ready to be merged... correct ?
yep, only potential improvement would be to use UPDATE, but not in this pr
@yawkat ok perfect... Can you just update the commit message to match our template ?
Can you squash merge it with this message:
Remove old timeout when deadline changes (#211) Motivation:
When the event loop timer deadline changes, the old timer would not be removed. This could potentially lead to memory issues.
Modification:
Use IORING_OP_TIMEOUT_REMOVE to remove a previously set timeout before a new one is registered. Use a generation counter to prevent race conditions between timeout add/remove in sqe/cqe.
Result:
Old timers are removed. At any time, there is only at most one timer per event loop.
@yawkat done... thanks a lot !
thanks!
I had a memory issue and @franz1981 suggested #211 as the cause. This patch is my fix for that bug, though I don't believe my mem issue was ultimately caused by this.
This PR does the legwork for adding ioringOpTimeoutRemove, and implementing a test. However two things can still be improved: