paritytech / substrate

Substrate: The platform for blockchain innovators
Apache License 2.0
8.38k stars 2.65k forks source link

Tests fail with `--test-threads 1` #10479

Closed ggwpez closed 1 year ago

ggwpez commented 2 years ago

Observation

Many tests fail when run with --test-threads 1. I only tried to run the pallet tests, eg:

cargo test -p "pallet-*" --no-fail-fast -- --test-threads 1

It also seems to happen with --features=runtime-benchmarks.
Maybe someone can check it for tests outside of pallet-*. Polkadot and Cumulus took too long on my machine to compile.

Assumed problem(s)

Thread local vars

The first test that fails (assets/freezer_should_work) should be fixed after #10473 by line which resets a dirty thread local var.
I first spotted it by running Tarpaulin on Substrate, but Tarpaulin was not the problem.
The fact that the test used thread local vars and tarpaulin implicitly forces --test-threads 1 seems to cause at least some of the failures.
The following tests only fails with --test-threads 1:

use std::cell::RefCell;

thread_local! {
    static STICKY: RefCell<bool> = RefCell::new(false);
}

#[test]
fn tarpauline() {
    // STICKY must be false since we just started the test.
    assert!(!STICKY.with(|s| *s.borrow()));
    // Set to true.
    STICKY.with(|s| s.replace(true));
}

#[test]
fn tarpauline2() {
    // STICKY must be false since we just started the test.
    assert!(!STICKY.with(|s| *s.borrow())); // <- FAILS HERE
    // Set to true.
    STICKY.with(|s| s.replace(true));
}

As far as I understand it: cargo test normally schedules the tests on multiple threads, which means that the STICKY var is false for each #[test].
When --test-threads 1 is used, the tests run consecutively and the second test that is executed gets the sticky/dirty value of the first test function; causing the failure.

Other causes

I did not have the time to look into the other failures to see if they could all be failing because of dirty thread locals.
Maybe there is more amiss.

Next

Please confirm and decide if this should be fixed.
Theoretically this could also be changed in cargo test, such that it resets thread locals before each #[test] :thinking:

wigy-opensource-developer commented 2 years ago

Rust does not have built-in setup/teardown support for test fixtures. But that does not mean we cannot use something like rstest. We seem to be using thread-local-storage for just easier fixture setup in these mocks and we want to use a fresh fixture for every test case, whatever thread they are running in.

ggwpez commented 2 years ago

We seem to be using thread-local-storage for just easier fixture setup in these mocks and we want to use a fresh fixture for every test case, whatever thread they are running in.

Some of these locals are accessed in callbacks and not just for closure I think, so they need to be accessible in the scope surrounding the #[test] functions.
AFAIU the rstest only passes arguments to the test functions, right? @wigy-opensource-developer

One solution that I would not be particularly proud of:
Create a define_thread_locals! macro to define all thread locals and creates a clear_thread_locals! macro to clear all RefCell with RefCell::new(Default::default()). This could then be called in new_test_ext().execute_with.
PS: Would be the same construction as used in https://github.com/paritytech/substrate/pull/10592

shawntabrizi commented 2 years ago

@ggwpez but what would be the final outcome we achieved here for additional complexity and work? Does not seem that really anyone executes tests like this anyway, which is why this is the first time it is being reported.

wigy-opensource-developer commented 2 years ago

Some of these locals are accessed in callbacks and not just for closure I think, so they need to be accessible in the scope surrounding the #[test] functions. AFAIU the rstest only passes arguments to the test functions, right?

If we want independent tests, we need to create a fresh fixture for each of them. Because the Rust test runner starts each #[test] on a fresh stack, it is only static and thread local state from mock.rs that can make the tests interact with each other making them erratic. (I try to use terms correctly from http://xunitpatterns.com/ )

ggwpez commented 2 years ago

but what would be the final outcome we achieved here for additional complexity and work? Does not seem that really anyone executes tests like this anyway, which is why this is the first time it is being reported.

Well, this is kind of what I meant with

Please confirm and decide if this should be fixed.

I'm not sure if it is worth it. Might be worth testing on a single threaded system, to see it it happens there as well @shawntabrizi.
PS: It does not compile with the current Tarpaulin version, but if it turns out not working this could be a reason to fix it.

For example when I look at the tests for pallet-assets, I do not see how this could be done with test closures since they would need to be accessible for the FrozenBalance balance impl here. @wigy-opensource-developer

wigy-opensource-developer commented 2 years ago

Yeah. So because the FrozenBalance trait does not take a proper &self or &mut self parameter on their methods, what can we do but use static state, right? Well, wrong!

The only reasonable way forward is to not treat every type definition in a #[pallet::config] a global variable. We hire Rust developers, but this is a completely different, limited language we can use in these macros, causing all kind of gotchas like this.

This should be completely viable for implementing FrozenBalance:

#[derive(Debug, Default)]
pub struct TestFreezer {
    frozen: HashMap<(u32, u64), u64>,
    hooks: Vec<Hook>,
}

impl FrozenBalance<u32, u64, u64> for TestFreezer {
    fn frozen_balance(&self, asset: u32, who: &u64) -> Option<u64> {
        self.frozen.get(&(asset, who.clone())).cloned()
    }

    fn died(&mut self, asset: u32, who: &u64) {
        self.hooks.push(Hook::Died(asset, who.clone()));
    }
}

impl TestFreezer {
    pub(crate) fn set_frozen_balance(&mut self, asset: u32, who: u64, amount: u64) {
        self.frozen.insert((asset, who), amount);
    }
    pub(crate) fn clear_frozen_balance(&mut self, asset: u32, who: u64) {
        self.frozen.remove(&(asset, who));
    }
    pub(crate) fn hooks(&self) -> Vec<Hook> {
        self.hooks.clone()
    }
}
ggwpez commented 2 years ago

I do not want to hold you back from fixing it. Just for me personally the priority of working on it is not be the highest.
You can open a MR if you want to fix it.
@wigy-opensource-developer

wigy-opensource-developer commented 2 years ago

@shawntabrizi What I understand from the FRAME design is that all state is in the storage, and in order to emphasize the "statelessness" (purity) of the block function composed out of all this code, self is missing from all these things intentionally. The missing piece is testability, which requires dependency injection. Although this is an article about F#, but writer on several books on the topic, Mark Seemann describes why just for the sake of testability we need to allow impurity in the final assembled runtime: https://blog.ploeh.dk/2017/01/30/partial-application-is-dependency-injection/

arkpar commented 2 years ago

Using TLS for tests looks like an anti-pattern to me. Why not just use the storage? Something like this should work:

pub struct TestFreezer;
impl FrozenBalance<u32, u64, u64> for TestFreezer {
    fn frozen_balance(asset: u32, who: &u64) -> Option<u64> {
        let key: u128 = (asset as u128) << 64 | *who as u128;
        sp_io::storage::get(&key.to_le_bytes()).map(|x| u64::from_le_bytes(x.as_slice().try_into().unwrap()))
    }
}

pub(crate) fn set_frozen_balance(asset: u32, who: u64, amount: u64) {
    let key: u128 = (asset as u128) << 64 | who as u128;
    sp_io::storage::set(&key.to_le_bytes(), &amount.to_le_bytes());
}
pub(crate) fn clear_frozen_balance(asset: u32, who: u64) {
    let key: u128 = (asset as u128) << 64 | who as u128;
    sp_io::storage::clear(&key.to_le_bytes());
}

Alternatively let key = (asset, who).encode()

wigy-opensource-developer commented 2 years ago

Using TLS for tests looks like an anti-pattern to me. Why not just use the storage? Something like this should work

Will you create a fresh storage for each and every test function? Otherwise you end up with the same interacting tests.

arkpar commented 2 years ago

Each test already uses a unique instance of TestExternalities which holds the storage. There should be no interference.

ggwpez commented 2 years ago

Yea there are still crates failing. Anyway we can do this cleanup lean step by step.

ggwpez commented 1 year ago

Does not reproduce on rust 1.70 anymore. Neither the pallet tests not the example code fail on one test thread 🎉