near / nearcore

Reference client for NEAR Protocol
https://near.org
GNU General Public License v3.0
2.3k stars 601 forks source link

Change apply-chunk tracker implementation #11653

Closed tayfunelmas closed 1 week ago

tayfunelmas commented 2 weeks ago

As discussed in #11447, if there is a panic (due to assertion failure) when running testloop locally, the test freezes instead of reporting the failure. This makes it difficult to run testloop in cases where we want to debug an assertion failure.

The reason for the freezing test failures is that the panic causes the objects be dropped and the drop handler for Chain blocks, waiting apply-chunks be finished: https://github.com/near/nearcore/blob/71d35d9921603d8dea60fcc41c0d5478202fc574/chain/chain/src/chain.rs#L295

However that wait does not end, since setter runs after apply-blocks is finished: https://github.com/near/nearcore/blob/71d35d9921603d8dea60fcc41c0d5478202fc574/chain/chain/src/chain.rs#L1831

And it does not get a chance to notify the waiters. For example, commenting out the drop handler for the Chain allows us to see the assertion failure, as expected.

To handle this for testloop, we change the implementation as follows:

  1. We use a condition variable to wait for apply-chunks processing and setting when it is done (condition variable naturally fits in this scenario for waiting the operation be done).
  2. We introduce a feature to be enabled when running testloop. This limits the total wait time for each thread to 5 seconds. This is testing-only and not enabled in production. This is still not ideal and does not immediately unblock the waiting threads when there is a panic, but at least puts a limit on the freeze before the assertion failure is exposed.

Note: We could also use a technique such as oneshot::Receiver, in which case the waiter will just return whenever the sender is dropped, but it works with async computations only, whereas the apply-chunks is executed in a multi-threaded fashion.

codecov[bot] commented 2 weeks ago

Codecov Report

Attention: Patch coverage is 91.35802% with 7 lines in your changes missing coverage. Please review.

Project coverage is 71.68%. Comparing base (af022d8) to head (03dfd7a).

Files Patch % Lines
chain/chain/src/block_processing_utils.rs 90.78% 6 Missing and 1 partial :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #11653 +/- ## ========================================== + Coverage 71.66% 71.68% +0.01% ========================================== Files 788 788 Lines 161300 161374 +74 Branches 161300 161374 +74 ========================================== + Hits 115589 115674 +85 + Misses 40674 40659 -15 - Partials 5037 5041 +4 ``` | [Flag](https://app.codecov.io/gh/near/nearcore/pull/11653/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=near) | Coverage Δ | | |---|---|---| | [backward-compatibility](https://app.codecov.io/gh/near/nearcore/pull/11653/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=near) | `0.23% <0.00%> (-0.01%)` | :arrow_down: | | [db-migration](https://app.codecov.io/gh/near/nearcore/pull/11653/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=near) | `0.23% <0.00%> (-0.01%)` | :arrow_down: | | [genesis-check](https://app.codecov.io/gh/near/nearcore/pull/11653/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=near) | `1.36% <0.00%> (-0.01%)` | :arrow_down: | | [integration-tests](https://app.codecov.io/gh/near/nearcore/pull/11653/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=near) | `37.87% <59.25%> (+0.03%)` | :arrow_up: | | [linux](https://app.codecov.io/gh/near/nearcore/pull/11653/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=near) | `69.08% <90.12%> (+0.03%)` | :arrow_up: | | [linux-nightly](https://app.codecov.io/gh/near/nearcore/pull/11653/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=near) | `71.18% <90.12%> (+0.02%)` | :arrow_up: | | [macos](https://app.codecov.io/gh/near/nearcore/pull/11653/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=near) | `52.66% <91.35%> (+<0.01%)` | :arrow_up: | | [pytests](https://app.codecov.io/gh/near/nearcore/pull/11653/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=near) | `1.59% <0.00%> (-0.01%)` | :arrow_down: | | [sanity-checks](https://app.codecov.io/gh/near/nearcore/pull/11653/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=near) | `1.39% <0.00%> (-0.01%)` | :arrow_down: | | [unittests](https://app.codecov.io/gh/near/nearcore/pull/11653/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=near) | `66.27% <91.35%> (+0.02%)` | :arrow_up: | | [upgradability](https://app.codecov.io/gh/near/nearcore/pull/11653/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=near) | `0.28% <0.00%> (-0.01%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=near#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.