refactor our test separation and how we run them

nagisa commented 1 week ago

This fundamentally removes the differentiation between "integration" and "unit" tests from the test runner perspective, as well as removes the reliance on the "expensive-tests" compile-time feature.

Instead tests are categorized by the underlying reason for which we had the separation in the first place -- test execution speed. These are now annotated with slow_test_ and ultraslow_test_ prefixes, and their execution time is enforced by time limits in nextest.toml.

By using nextest filters we can cleanly filter out tests from these three categories to be run in various contexts. For instance, maybe quick checks locally don't need to run the slow tests every time -- just nextest by default now takes just 9 seconds (on my machine) to run all the tests. Whereas previously (due to e.g. estimator smoke test) it would take at least a minute.

Then a just nextest-slow is added in the rarer cases where one might want to run the full suite that runs on CI. CI will run these mildly slower tests for each PR.

Then what remains are the ultraslow tests which only run on nayduck and can take quite some time to complete. These can now be trivially run locally as well with just nextest-all without necessarily rebuilding the entire project or "unignoring" tests that are marked as ignored for other reasons (like for instance because they're just broken.)

nagisa commented 1 week ago

cc @wacban -- I pushed out a draft cause I gotta run now, but feedback would be neat if this addresses your problems with the bls tests.

nagisa commented 1 week ago

Did you check that check_nightly.py still does what it's supposed to do?

You can find it failing even now at https://github.com/near/nearcore/actions/runs/11845706954/job/33011766898?pr=12458 due to me having missed to include one of the tests!

That said, one of the things we could do here at this point is to get rid of expensive.txt as cargo nextest list can list all the tests that match the ultraslow pattern without having to write it all down explicitly.

codecov[bot] commented 1 week ago

Codecov Report

Attention: Patch coverage is 75.00000% with 1 line in your changes missing coverage. Please review.

Project coverage is 69.85%. Comparing base (f4db1e0) to head (daaf8c5). Report is 39 commits behind head on master.

Files with missing lines	Patch %	Lines
...e-params-estimator/estimator-warehouse/src/main.rs	0.00%	1 Missing :warning:

Additional details and impacted files

```diff @@ Coverage Diff @@ ## master #12458 +/- ## =========================================== + Coverage 39.74% 69.85% +30.10% =========================================== Files 842 838 -4 Lines 170756 169385 -1371 Branches 170756 169385 -1371 =========================================== + Hits 67872 118319 +50447 + Misses 98635 45813 -52822 - Partials 4249 5253 +1004 ``` | [Flag](https://app.codecov.io/gh/near/nearcore/pull/12458/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=near) | Coverage Δ | | |---|---|---| | [backward-compatibility](https://app.codecov.io/gh/near/nearcore/pull/12458/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=near) | `0.16% <ø> (+<0.01%)` | :arrow_up: | | [db-migration](https://app.codecov.io/gh/near/nearcore/pull/12458/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=near) | `0.16% <ø> (+<0.01%)` | :arrow_up: | | [genesis-check](https://app.codecov.io/gh/near/nearcore/pull/12458/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=near) | `1.29% <ø> (+0.02%)` | :arrow_up: | | [integration-tests](https://app.codecov.io/gh/near/nearcore/pull/12458/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=near) | `?` | | | [linux](https://app.codecov.io/gh/near/nearcore/pull/12458/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=near) | `69.16% <75.00%> (+30.47%)` | :arrow_up: | | [linux-nightly](https://app.codecov.io/gh/near/nearcore/pull/12458/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=near) | `69.42% <75.00%> (+30.12%)` | :arrow_up: | | [macos](https://app.codecov.io/gh/near/nearcore/pull/12458/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=near) | `51.02% <100.00%> (?)` | | | [pytests](https://app.codecov.io/gh/near/nearcore/pull/12458/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=near) | `1.60% <ø> (+0.02%)` | :arrow_up: | | [sanity-checks](https://app.codecov.io/gh/near/nearcore/pull/12458/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=near) | `1.40% <ø> (+0.02%)` | :arrow_up: | | [unittests](https://app.codecov.io/gh/near/nearcore/pull/12458/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=near) | `69.68% <75.00%> (?)` | | | [upgradability](https://app.codecov.io/gh/near/nearcore/pull/12458/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=near) | `0.21% <ø> (+<0.01%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=near#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

wacban commented 1 week ago

I've seen this nayduck error, the fix for me was to use with underscore instead of dash in one place in the expensive.txt: expensive estimator-warehouse estimator_warehouse tests::ultra_slow_test_full_estimator

nagisa commented 1 week ago

Umm… so apparently nayduck's tests are limited to 3 minutes?? So much for ultra slow :thinking: But wait, there are others that take 12… how does that work?

wacban commented 6 days ago

Umm… so apparently nayduck's tests are limited to 3 minutes?? So much for ultra slow 🤔 But wait, there are others that take 12… how does that work?

From expensive.txt it seems like you can increase the default timeout like so:

expensive --timeout=1200 near-chain near_chain tests::garbage_collection::test_gc_not_remove_fork_large

near / nearcore

refactor our test separation and how we run them #12458

Codecov Report