nanocurrency / nano-node

Nano is digital currency. Its ticker is: XNO and its currency symbol is: Ӿ
https://nano.org
BSD 3-Clause "New" or "Revised" License
3.48k stars 786 forks source link

Collect all intermittently failing tests #1121

Open rkeene opened 6 years ago

rkeene commented 6 years ago

If you have a test that fails intermittently let's collect them here for systematic investigation

rkeene commented 6 years ago

On Linux/x86_64 from Travis-CI:

[ RUN      ] rpc.wallet_destroy
core_test: /workspace/rai/secure/blockstore.cpp:102: rai::mdb_iterator<T, U>::mdb_iterator(MDB_txn*, MDB_dbi, const MDB_val&, rai::epoch) [with T = rai::uint256_union; U = rai::wallet_value; MDB_txn = MDB_txn; MDB_dbi = unsigned int; MDB_val = MDB_val]: Assertion `status == 0' failed.

FIXED - original error is not reproducible on develop, but there is a tsan race which is fixed with https://github.com/nanocurrency/nano-node/pull/2615

rkeene commented 6 years ago

On Linux/x86_64:

[ RUN      ] node.auto_bootstrap
/home/rkeene/devel/raiblocks/rai/core_test/node.cpp:227: Failure
Value of: system.poll ()
  Actual: Deadline expired
Expected:

FIXED

cryptocode commented 6 years ago

On Travis (trusty):

[ RUN      ] wallet.create_open_receive
/workspace/rai/qt_test/qt.cpp:400: Failure
Value of: json1.empty ()
  Actual: true
Expected: false

FIXED

cryptocode commented 6 years ago

Failing on master: Fixed by @SergiySW

[ RUN      ] rpc.payment_begin_end
/workspace/rai/core_test/rpc.cpp:1162: Failure
Value of: ec
  Actual: Deadline expired
Expected: 
[  FAILED  ] rpc.payment_begin_end (10080 ms)

Quick when it succeeds, when it fails, it seems stuck forever (all threads are waiting on some condition variable)

rkeene commented 5 years ago
[ RUN      ] node.unlock_search
core_test: /workspace/rai/node/wallet.cpp:1385:
                     void rai::wallets::foreach_representative(
                         const rai::transaction&,
                         const std::function<void(
                             const rai::uint256_union&,
                             const rai::raw_key&
                         )>&
                     ): Assertion `!error' failed.

FIXED by https://github.com/nanocurrency/raiblocks/commit/b1d05d0ce83e6aa2950f8ecfa1a3c266cfcde2a0

PlasmaPower commented 5 years ago

EDIT: Fixed by https://github.com/nanocurrency/raiblocks/pull/1532

On local Linux box with some deadlock detection code that may exacerbate the race:

/var/hdd/programming/cpp/raiblocks/rai/core_test/network.cpp:92: Failure
Value of: peers1.size ()
  Actual: 0
Expected: 1
[  FAILED  ] network.send_node_id_handshake (238 ms)
PlasmaPower commented 5 years ago

on 051df7be9b4a9f755e6a94569da1f430f9982ca0

[ RUN      ] bulk.offline_send
/var/hdd/programming/cpp/raiblocks/rai/core_test/network.cpp:826: Failure
Value of: system.poll ()
  Actual: Deadline expired
Expected:
[  FAILED  ] bulk.offline_send (20410 ms)

FIXED

PlasmaPower commented 5 years ago

On master 37de74f795e32daa02150da49b5f0140211a8c44

[ RUN      ] rpc.online_reps
/home/lee/programming/cpp/raiblocks2/rai/core_test/rpc.cpp:3662: Failure
Expected: (nullptr) != (receive), actual: 8-byte object <00-00 00-00 00-00 00-00> vs 16-byte object <00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00>
[  FAILED  ] rpc.online_reps (238 ms)

EDIT: This test now fails with:

[ RUN      ] rpc.online_reps
nano/rpc_test/rpc.cpp:6121: Failure
Value of: system.poll ()
  Actual: Deadline expired
Expected:
cryptocode commented 5 years ago

On Linux/x86_64 from Travis-CI:

[ RUN      ] rpc.wallet_destroy
core_test: /workspace/rai/secure/blockstore.cpp:102: rai::mdb_iterator<T, U>::mdb_iterator(MDB_txn*, MDB_dbi, const MDB_val&, rai::epoch) [with T = rai::uint256_union; U = rai::wallet_value; MDB_txn = MDB_txn; MDB_dbi = unsigned int; MDB_val = MDB_val]: Assertion `status == 0' failed.

This one happens because wallet_destroy sets handle = 0 after do_wallet_actions does the wallet->live () check, so a TOCTTOU it seems.

rkeene commented 5 years ago
[ RUN      ] node.fork_publish
/home/rkeene/devel/raiblocks/nano/core_test/node.cpp:643: Failure
Value of: node1.active.roots.size ()
  Actual: 0
Expected: 1
[  FAILED  ] node.fork_publish (668 ms)

and

[ RUN      ] node.fork_multi_flip
/home/rkeene/devel/raiblocks/nano/core_test/node.cpp:788: Failure
Value of: node2.active.roots.size ()
  Actual: 1
Expected: 2
[  FAILED  ] node.fork_multi_flip (1820 ms)
cryptocode commented 5 years ago
[ RUN      ] wallet.select_account
/workspace/rai/qt_test/qt.cpp:96: Failure
Value of: key4
  Actual: 32-byte object <52-E8 58-72 7C-65 C8-2F DE-A5 30-C8 EC-DF BC-B9 BA-14 06-D0 82-64 06-20 5C-03 92-67 06-DB 0F-F1>
Expected: key2
Which is: 32-byte object <2C-29 08-C1 5F-B9 6C-C6 6F-4E 2E-AB A6-3F 5F-C7 E2-5E CE-53 AC-0E 3C-09 01-A4 23-4E 5B-07 4B-AD>
[  FAILED  ] wallet.select_account (55 ms)

FIXED

cryptocode commented 5 years ago

node.unlock_search fails sporadically. When debugging it this sequence happens:

vote_generator::send -> foreach_representative -> wallet.store.fetch

...where fetch returns an error here:

    if (!result)
    {
        nano::public_key compare (nano::pub_key (prv.data));
        std::cout << pub.to_string() << " == " << compare.to_string() << std::endl;
        if (!(pub == compare))
        {
            result = true;  // <<======== oops
        }
    }

FIXED (https://github.com/nanocurrency/nano-node/issues/1121#issuecomment-450866102)

SergiySW commented 5 years ago

@cryptocode was fixed in 17.0, but not in master https://github.com/nanocurrency/raiblocks/pull/1409/commits/b1d05d0ce83e6aa2950f8ecfa1a3c266cfcde2a0

SergiySW commented 5 years ago

Rare node.vote_republish test failure. node::process_confirmed () can start before rolling back of fork block

[ RUN      ] node.vote_republish
/root/rai_test/nano/core_test/node.cpp:1835: Failure
Value of: system.poll ()
  Actual: Deadline expired
Expected:
[  FAILED  ] node.vote_republish (5307 ms)

FIXED

cryptocode commented 5 years ago
[ RUN      ] wallet.no_work
Expected equality of these values:
  0
  cached_work
    Which is: 4545208490332381752

Happened once, unable to reproduce

FIXED

wezrule commented 5 years ago

rpc.online_reps fails about 50% of the time for me (on MSVC at least)

[ RUN      ] rpc.online_reps
c:\users\wesley\documents\raiblocks\nano\core_test\rpc.cpp(4246): error: Value of: weight2
 Actual: "340281366920938463463374607431768211455"
Expected: system.nodes[1]->weight (nano::test_genesis_key.pub).convert_to<std::string> ()
Which is: "340282366920938463463374607431768211455"

FIXED #1848

cryptocode commented 5 years ago
[ RUN      ] work.eco_pow
work_pool.cpp:185: Failure
Expected: (future1.get ()) < (future2.get ()), actual: 8-byte object <A4-A5 57-00 00-00 00-00> vs 8-byte object <9E-2C 34-00 00-00 00-00>

FIXED

wezrule commented 5 years ago

@cryptocode This should be fixed in #1882 can you give it a try? I actually couldn't reproduce it on my Windows machine, but I do know it could have an issue (I had a comment in the test, but didn't really explain why).

FIXED

wezrule commented 5 years ago
[ RUN      ] node.broadcast_elected
Assertion (!error) failed c:\users\wesley\documents\raiblocks\nano\node\node.cpp:3114

Failing on this release_assert in nano::active_transactions::add:

auto error (nano::work_validate (*block_a, &difficulty));
release_assert (!error);
wezrule commented 5 years ago
[  RUN         ] conflicts.adjusted_difficulty
conflicts.cpp:247: Failure
Expected equality of these values:
  11
  nodes.active.size ()
    Which is: 10

Fails consistently when running TSAN on Ubuntu/GCC

FIXED (no longer able to reproduce locally, feel free to re-check @wezrule)

wezrule commented 5 years ago
[  RUN         ] network.send_node_id_handshake
nano/core_test/network:247: Failure
Expected equality of these values:
  0
  system.nodes[0]->network.size ()
    Which is: 1

Fails consistently when running TSAN on Ubuntu/GCC

FIXED in https://github.com/nanocurrency/nano-node/pull/2612

wezrule commented 5 years ago
[ RUN      ] receivable_processor.send_with_receive
/workspace/nano/core_test/network.cpp:337: Failure
Expected equality of these values:
  system.nodes[0]->config.receive_minimum.number ()
    Which is: 1000000000000000000000000
  system.nodes[1]->balance (key2.pub)
    Which is: 0

FIXED in #1940

wezrule commented 5 years ago
[ RUN      ] rpc.work_peer_bad
d:\agent\_work\2\s\src\vctools\crt\crtw32\stdcpp\thr\mutex.c(51): mutex destroyed while busy

Stack trace

wezrule commented 5 years ago
[ RUN      ] network.replace_port
c:\users\wesley\documents\raiblocks\nano\core_test\network.cpp(2037): error: Expected equality of these values:
  system.nodes[0]->network.udp_channels.size ()
    Which is: 1
  0
[  FAILED  ] network.replace_port (129 ms)

FIXED in #2630

wezrule commented 5 years ago
[ RUN      ] node.vote_by_hash_republish
/workspace/nano/core_test/node.cpp:2196: Failure
Value of: system.poll ()
  Actual: Deadline expired
cryptocode commented 5 years ago
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from active_transactions
[ RUN      ] active_transactions.adjusted_difficulty_priority
nano/core_test/active_transactions.cpp:143: Failure
Expected: (i->adjusted_difficulty) < (last_adjusted), actual: 18446724124103881201 vs 18446724124103881201
[  FAILED  ] active_transactions.adjusted_difficulty_priority (3781 ms)
[----------] 1 test from active_transactions (3781 ms total)

FIXED

cryptocode commented 5 years ago
[ RUN      ] ledger.unchecked_epoch_invalid
core_test/ledger.cpp:2580: Failure
Value of: node1.active.empty ()
  Actual: false
Expected: true

FIXED

cryptocode commented 5 years ago
[ RUN      ] active_transactions.prioritize_chains
nano/core_test/active_transactions.cpp:211: Failure
Value of: system.poll ()
  Actual: Deadline expired
Expected:

FIXED

cryptocode commented 5 years ago

confirmation_height.modified_chain sometimes get stuck (no timeout), presumably it's one of the while (!node->write_database_queue.contains (nano::writer::confirmation_height)) loops @wezrule ?

FIXED (was reproducible, but now survives thousands of iterations, so seems fixed by recent changes)

guilhermelawless commented 5 years ago

Ocasionally node.bootstrap_confirm_frontiers times out waiting for the election to start, in:

while (node1->active.empty ())
{
    ASSERT_NO_ERROR (system0.poll ());
    ASSERT_NO_ERROR (system1.poll ());
}

edit: No longer failing

cryptocode commented 5 years ago
[ RUN      ] node.local_votes_cache
node.cpp:2348: Failure
Expected equality of these values:
  current_vote->sequence
    Which is: 3
  2

Happens more frequently if running TSAN, so likely timing related

FIXED

cryptocode commented 4 years ago

There's an intermittent ASAN error with node.vote_by_hash_bundle Looks like it's a lifetime issue with max_hashes - must maybe wait until all observer calls are done before exiting the test, or extend the lifetime by whatever means.

FIXED by https://github.com/nanocurrency/nano-node/pull/2356

guilhermelawless commented 4 years ago

STATUS: FIXED

[ RUN      ] confirmation_height.pending_observer_callbacks
core_test/confirmation_height.cpp:774: Failure
Expected equality of these values:
  0
  node->stats.count (nano::stat::type::http_callback, nano::stat::detail::http_callback, nano::stat::dir::out)
    Which is: 1
cryptocode commented 4 years ago
[ RUN      ] rpc.work_peer_many
unknown file: error: SEH exception with code 0xc0000005 thrown in the test body.
[  FAILED  ] rpc.work_peer_many (1842 ms)
[ RUN      ] rpc.block_count
Assertion failed: nano::thread_role::get () != nano::thread_role::name::io, file D:\a\raiblocks\raiblocks\nano\secure\blockstore.cpp, line 419

on CI. Wesley suggested it could be a nullptr dereference and the second (lmdb write on io thread) is a side effect. Possibly fixed by https://github.com/nanocurrency/nano-node/pull/2474 @guilhermelawless as it mentions fixing a TSAN issue with work_peer_many ?

Fixed

guilhermelawless commented 4 years ago

Hopefully, although I've never seen it actually fail myself, only TSAN race warnings. One due to work not being atomic in the test itself, the others due to not using a strand, which could have some side effects I imagine.

Ran #2474 a few hundred times under TSAN with no more issues.

cryptocode commented 4 years ago
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from rpc
[ RUN      ] rpc.account_lazy_start
rpc.cpp:7534: Failure
Value of: account_error.is_initialized ()
  Actual: false
Expected: true
[  FAILED  ] rpc.account_lazy_start (216 ms)
[----------] 1 test from rpc (216 ms total)

Unable to reproduce, appears to be fixed by recent changes.

cryptocode commented 4 years ago

This hit a couple of times (debug build):

[ RUN      ] distributed_work.peer_malicious
nano/core_test/distributed_work.cpp:214: Failure
Value of: system.poll ()
  Actual: Deadline expired
Expected:

Fixed by https://github.com/nanocurrency/nano-node/pull/2500

cryptocode commented 4 years ago

websocket.bootstrap_exited segfaults from time to time

Fixed (unable to reproduce, appears to be fixed by recent changes to websocket tests)

cryptocode commented 4 years ago
[ RUN      ] node_telemetry.batch_use_single_request_cache
nano/core_test/node_telemetry.cpp:772: Failure
Expected equality of these values:
  2
  node_client->stats.count (nano::stat::type::message, nano::stat::detail::telemetry_ack, nano::stat::dir::in)
    Which is: 1
[  FAILED  ] node_telemetry.batch_use_single_request_cache (7374 ms)

Seems to happen occasionally under TSAN, so likely timing related

Edit: This test is now called all_peers_use_single_request_cache Needs to be rechecked if issue still exists since the telemetry refactor

Edit2: This appears to be fixed with all_peers_use_single_request_cache

cryptocode commented 4 years ago

confirm_back failures, as seen on CI, is reproducible locally.

The active roots container sometimes never drains, so it times out.

Possibly solved by https://github.com/nanocurrency/nano-node/pull/2441

cryptocode commented 4 years ago

Moved convos/status into the corresponding comment and marked them resolved.

cryptocode commented 4 years ago
[ RUN      ] confirmation_height.gap_live
nano/core_test/confirmation_height.cpp:352: Failure
Expected equality of these values:
  1
  confirmation_height_info.height
    Which is: 2

Fairly easy to reproduce with TSAN

Fixed in #2621

wezrule commented 4 years ago
[ RUN      ] wallet.work_watcher_update
D:\a\raiblocks\raiblocks\nano\core_test\wallet.cpp(1171): error: Value of: system.poll ()
  Actual: Deadline expired
Expected: 
[  FAILED  ] wallet.work_watcher_update (20349 ms)

Fails a lot on Windows Actions

edit: Fixed by https://github.com/nanocurrency/nano-node/pull/2677

guilhermelawless commented 4 years ago
[ RUN      ] node.aggressive_flooding
/nano-node/nano/core_test/node.cpp:3699: Failure
Value of: system.poll ()
  Actual: Deadline expired
Expected:
[  FAILED  ] node.aggressive_flooding (17760 ms)

Fails occasionally locally under TSAN.

edit: Should be fixed by https://github.com/nanocurrency/nano-node/pull/2654 edit2: Fixed in https://github.com/nanocurrency/nano-node/pull/2656

wezrule commented 4 years ago

Mac Actions (https://github.com/nanocurrency/nano-node/runs/617432809?check_suite_focus=true)

[ RUN      ] network.tcp_no_connect_excluded_peers
/Users/runner/runners/2.169.0/work/raiblocks/raiblocks/nano/core_test/network.cpp:1131: Failure
Value of: system.poll ()
  Actual: Deadline expired
Expected: 
wezrule commented 4 years ago
[ RUN      ] system.generate_send_new
/Users/runner/runners/2.169.0/work/raiblocks/raiblocks/nano/core_test/ledger.cpp:622: Failure
Value of: system.poll ()
  Actual: Deadline expired
Expected: 

Likely fixed by: https://github.com/nanocurrency/nano-node/pull/2742

guilhermelawless commented 4 years ago
[ RUN      ] node.fork_invalid_block_signature
nano-node/nano/core_test/node.cpp:3123: Failure
Value of: system.poll ()
  Actual: Deadline expired
Expected:
[  FAILED  ] node.fork_invalid_block_signature (5195 ms)

EDIT: Bisected to 5dac531c911b0e19e2c615397fa38e94a9b34ecd

EDIT: Likely fixed by https://github.com/nanocurrency/nano-node/pull/2743

wezrule commented 4 years ago

Failed on MacOS actions:

[ RUN      ] node_telemetry.remove_peer_different_genesis
/Users/runner/runners/2.169.1/work/nano-node/nano-node/nano/core_test/node_telemetry.cpp:746: Failure
Expected equality of these values:
  0
  node0->network.size ()
    Which is: 1
[  FAILED  ] node_telemetry.remove_peer_different_genesis (396 ms)

Edit: Done in #2738

wezrule commented 4 years ago

Failed on Windows actions:

[ RUN      ] conflicts.adjusted_multiplier
D:\a\nano-node\nano-node\nano\core_test\conflicts.cpp(296): error: Expected equality of these values:
  node1.active.roots.get<1> ().begin ()->election->status.winner->hash ()
    Which is: 32-byte object <C0-A9 C2-5D BB-BA EE-B1 86-6A 47-85 23-FF E8-83 84-98 AB-B0 41-E9 BD-D3 AF-15 48-FC 2A-97 05-2E>
  open_epoch2->hash ()
    Which is: 32-byte object <A4-58 4D-27 50-79 D8-6D A3-1F 35-42 A9-4D 66-4D 17-EC 26-DB C7-09 17-01 48-CB 81-34 BD-3E 1E-6F>
[  FAILED  ] conflicts.adjusted_multiplier (296 ms)

Edit: Solved by https://github.com/nanocurrency/nano-node/pull/2739

wezrule commented 4 years ago

Failed with gcc actions (https://github.com/nanocurrency/nano-node/runs/617432731?check_suite_focus=true), also failed with macos (https://github.com/nanocurrency/nano-node/runs/617432809?check_suite_focus=true)

[ RUN      ] work_watcher.propagate
/workspace/nano/core_test/wallet.cpp:1259: Failure
Value of: system.poll ()
  Actual: Deadline expired
Expected: 
[  FAILED  ] work_watcher.propagate (10258 ms)

EDIT: Fixed in https://github.com/nanocurrency/nano-node/pull/2748