romange / helio

A modern framework for backend development based on io_uring Linux interface
Apache License 2.0
435 stars 49 forks source link

An attempt fixing the bug with inconsistent state in fibers #131

Closed romange closed 1 year ago

romange commented 1 year ago

Specifically, fix #130. The code that crashed makes sure that a fiber is not in remote_ready queue when it wakes up: it yields if it is, and thus allows the dispatcher to empty the remote queue and pull the active fiber from there.

The code crashes because after the Yield call, it appears that the fiber is still in the remote queue. Based on the code inspection, it seems unprobable that a dispatcher won't empty the remote queue, therefore I suspect that remote queue was corrupted in the first place.

This fix focuses on ScheduleFromRemote (producer sider) that before that did not protect against concurrent calls for the same fiber. It was partly justified because ScheduleFromRemote is usually called from the WaitQueue code under lock. However, there are other cases that can call this function, and maybe they corrupted the queue.

In addition, we removed the epoch based notifications which did not contribute to the correctness.

codecov-commenter commented 1 year ago

Codecov Report

Patch coverage: 86.20% and project coverage change: -0.04% :warning:

Comparison is base (a601dd6) 76.32% compared to head (cf75c3e) 76.28%.

:exclamation: Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the GitHub App Integration for your organization. Read more.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #131 +/- ## ========================================== - Coverage 76.32% 76.28% -0.04% ========================================== Files 97 97 Lines 7057 7051 -6 ========================================== - Hits 5386 5379 -7 - Misses 1671 1672 +1 ``` | [Files Changed](https://app.codecov.io/gh/romange/helio/pull/131?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Roman+Gershman) | Coverage Δ | | |---|---|---| | [base/mpsc\_intrusive\_queue.h](https://app.codecov.io/gh/romange/helio/pull/131?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Roman+Gershman#diff-YmFzZS9tcHNjX2ludHJ1c2l2ZV9xdWV1ZS5o) | `100.00% <ø> (+3.22%)` | :arrow_up: | | [util/fibers/detail/fiber\_interface.cc](https://app.codecov.io/gh/romange/helio/pull/131?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Roman+Gershman#diff-dXRpbC9maWJlcnMvZGV0YWlsL2ZpYmVyX2ludGVyZmFjZS5jYw==) | `93.38% <ø> (+0.81%)` | :arrow_up: | | [util/fibers/synchronization.h](https://app.codecov.io/gh/romange/helio/pull/131?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Roman+Gershman#diff-dXRpbC9maWJlcnMvc3luY2hyb25pemF0aW9uLmg=) | `89.30% <ø> (ø)` | | | [util/fibers/detail/scheduler.cc](https://app.codecov.io/gh/romange/helio/pull/131?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Roman+Gershman#diff-dXRpbC9maWJlcnMvZGV0YWlsL3NjaGVkdWxlci5jYw==) | `97.28% <76.47%> (-2.16%)` | :arrow_down: | | [util/fibers/detail/fiber\_interface.h](https://app.codecov.io/gh/romange/helio/pull/131?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Roman+Gershman#diff-dXRpbC9maWJlcnMvZGV0YWlsL2ZpYmVyX2ludGVyZmFjZS5o) | `92.30% <100.00%> (+0.30%)` | :arrow_up: | | [util/fibers/detail/wait\_queue.cc](https://app.codecov.io/gh/romange/helio/pull/131?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Roman+Gershman#diff-dXRpbC9maWJlcnMvZGV0YWlsL3dhaXRfcXVldWUuY2M=) | `100.00% <100.00%> (ø)` | | | [util/fibers/detail/wait\_queue.h](https://app.codecov.io/gh/romange/helio/pull/131?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Roman+Gershman#diff-dXRpbC9maWJlcnMvZGV0YWlsL3dhaXRfcXVldWUuaA==) | `100.00% <100.00%> (ø)` | | | [util/fibers/fibers\_test.cc](https://app.codecov.io/gh/romange/helio/pull/131?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Roman+Gershman#diff-dXRpbC9maWJlcnMvZmliZXJzX3Rlc3QuY2M=) | `99.06% <100.00%> (ø)` | | | [util/fibers/synchronization.cc](https://app.codecov.io/gh/romange/helio/pull/131?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Roman+Gershman#diff-dXRpbC9maWJlcnMvc3luY2hyb25pemF0aW9uLmNj) | `67.07% <100.00%> (-0.40%)` | :arrow_down: |

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.