waku-org / nwaku

Waku node and protocol.
Other
199 stars 52 forks source link

bug: Test failure on test_all #2304

Closed AlejandroCabeza closed 3 months ago

AlejandroCabeza commented 9 months ago

Draft: Trying to investigate what triggers this issue.

Problem

When running all the tests with the tests/test_all file, the 'WakuNode2 - Validators::Spam protected topic accepts signed messages' test fails. This doesn't happen when running the tests specifying the test name (nim c --passL:librln_v0.3.4.a --passL:-lm -r tests/test_all 'WakuNode2 - Validators::Spam protected topic accepts signed messages') or when just running the binary of the file where the test is defined.

Impact

Low?

To reproduce

  1. Run: nim c --passL:librln_v0.3.4.a --passL:-lm -r tests/test_all

Screenshots/logs

============================================
  nwaku/tests/test_all 'WakuNode2 - Validators::Spam protected topic accepts signed messages'
--------------------------------------------
../csu/libc-start.c(392)
../sysdeps/nptl/libc_start_call_main.h(58)
nwaku/vendor/nimbus-build-system/vendor/Nim/lib/pure/unittest.nim(760) main
nwaku/vendor/nimbus-build-system/vendor/Nim/lib/pure/unittest.nim(750) NimMain
nwaku/vendor/nimbus-build-system/vendor/Nim/lib/pure/unittest.nim(741) PreMain
nwaku/vendor/nim-unittest2/unittest2.nim(1128) atmwakunode2atstest_validatorsdotnim_Init000
nwaku/vendor/nim-unittest2/unittest2.nim(1056) runDirect
nwaku/vendor/nim-testutils/testutils/unittests.nim(16) runTestX60gensym3
nwaku/vendor/nim-chronos/chronos/internal/asyncfutures.nim(571) waitFor
nwaku/vendor/nim-chronos/chronos/internal/asyncengine.nim(146) poll
nwaku/vendor/nim-chronos/chronos/internal/asyncfutures.nim(591) cb

    Unhandled defect: Asynchronous task [onDisconnect() at group_manager.nim:500] finished with an exception "CatchableError"!
Message: Failed to reconnect with the Ethereum client:
Stack trace: ../csu/libc-start.c(392)
../sysdeps/nptl/libc_start_call_main.h(58)
nwaku/vendor/nimbus-build-system/vendor/Nim/lib/pure/unittest.nim(760) main
nwaku/vendor/nimbus-build-system/vendor/Nim/lib/pure/unittest.nim(750) NimMain
nwaku/vendor/nimbus-build-system/vendor/Nim/lib/pure/unittest.nim(741) PreMain
nwaku/vendor/nim-unittest2/unittest2.nim(1128) atmwakunode2atstest_validatorsdotnim_Init000
nwaku/vendor/nim-unittest2/unittest2.nim(1056) runDirect
nwaku/vendor/nim-testutils/testutils/unittests.nim(16) runTestX60gensym3
nwaku/vendor/nim-chronos/chronos/internal/asyncfutures.nim(571) waitFor
nwaku/vendor/nim-chronos/chronos/internal/asyncengine.nim(146) poll
nwaku/vendor/nim-chronos/chronos/internal/asyncfutures.nim(359) futureContinue
nwaku/waku/waku_rln_relay/group_manager/on_chain/retry_wrapper.nim(32) onDisconnect
 [FutureDefect]

  [FAILED ] (  3.21s) Spam protected topic accepts signed messages

(11.6s)   WakuNode2 - Validators

[Summary] 534 tests run (958.8s): 528 OK, 1 FAILED, 5 SKIPPED
Error: execution of an external program failed: 'nwaku/tests/test_all'

nwaku version/commit hash

chair28980 commented 9 months ago

I've observed similar error on my local.

alrevuelta commented 9 months ago

How verbose are the logs? trace perhaps? This doesn't seem to fail in CI.

AlejandroCabeza commented 9 months ago

This doesn't happen in the CI because on the CI we don't run the tests using test_all (that's a file I created, and I wasn't sure it should be ran). I'm still investigating, I'll provide logs and context asap.

alrevuelta commented 9 months ago

onDisconnect() at group_manager.nim:500

that seems an onchain rln thing, that requires an ethereum node running. Unsure how the tests are being run, but I doubt you have that node?

AlejandroCabeza commented 9 months ago

Logs

Here isolated logs from the test case on the two scenarios:

AlejandroCabeza commented 9 months ago

onDisconnect() at group_manager.nim:500

that seems an onchain rln thing, that requires an ethereum node running. Unsure how the tests are being run, but I doubt you have that node?

I've done nothing other than run the binaries, but as I mentioned earlier, when running test_all it fails, and when running test_all "name of the test" succeeds.

As far as I'm seeing, the difference is notable early on. The peer dialing doesn't seem to work in the first scenario, but it does on the second. I wonder if it could be a bleeding effect from an improper cleanup.

gabrielmer commented 3 months ago

@AlejandroCabeza can you please confirm if this is still happening? We saw something similar get fixed with https://github.com/waku-org/nwaku/pull/2690 but not sure it will affect this case

gabrielmer commented 3 months ago

I don't see that test failing anymore. However, there's one failing test: In ./tests/waku_rln_relay/test_waku_rln_relay.nim, test getMetadata: empty rln metadata

https://github.com/waku-org/nwaku/blob/c5d19c449169b674d9aacc2cc2d6fd4b84d4ce23/tests/waku_rln_relay/test_waku_rln_relay.nim#L280-L288

Apparently get_metadata() returns false in the following line if metadata is not set. https://github.com/waku-org/nwaku/blob/c5d19c449169b674d9aacc2cc2d6fd4b84d4ce23/waku/waku_rln_relay/rln/wrappers.nim#L547

@rymnc can you please check if it also happens to you? If so, should we change the test to expect an error?

rymnc commented 3 months ago

I don't see that test failing anymore. However, there's one failing test: In ./tests/waku_rln_relay/test_waku_rln_relay.nim, test getMetadata: empty rln metadata

https://github.com/waku-org/nwaku/blob/c5d19c449169b674d9aacc2cc2d6fd4b84d4ce23/tests/waku_rln_relay/test_waku_rln_relay.nim#L280-L288

Apparently get_metadata() returns false in the following line if metadata is not set. https://github.com/waku-org/nwaku/blob/c5d19c449169b674d9aacc2cc2d6fd4b84d4ce23/waku/waku_rln_relay/rln/wrappers.nim#L547

@rymnc can you please check if it also happens to you? If so, should we change the test to expect an error?

what version of librln do you have? it should not error. please note that it was patched and librln 0.3.7 is the latest for rln-v1.

gabrielmer commented 3 months ago

what version of librln do you have? it should not error. please note that it was patched and librln 0.3.7 is the latest for rln-v1.

Oh it was that, I was using librln_v0.3.4.a but changed to librln_v0.3.7.a and the test passed :))

Apologies and thanks so much!

gabrielmer commented 3 months ago

Running librln_v0.5.1.a all tests pass with the following command

nim c -r -d:chronicles_log_level=DEBUG -d:release -d:postgres  -d:rln --passL:librln_v0.5.1.a --passL:-lm -d:nimDebugDlOpen tests/test_all

Closing the issue. Please reopen in case someone still gets failures