Closed s1fr0 closed 1 year ago
@s1fr0 any update on this? Have we come to a conclusion whether this is in any way related to timing only?
Have we come to a conclusion whether this is in any way related to timing only?
This is an umbrella issue. @jm-clius is referring to the investigation held in PR #1368.
I will split this into separate issues and address the Waku Bridge failures.
Revisiting this. Some impressions from looking at recent errors:
Filter test failures are all due to race conditions between client subscribing and service node handling a message that matches that subscription. I will address the timing, but this is a fundamental flaw in the protocol being addressed in: https://github.com/vacp2p/rfc/pull/562
Draft for improving filter timeouts: https://github.com/waku-org/nwaku/pull/1529
Afaict most of the existing CI issues have been addressed and https://github.com/waku-org/nwaku/pull/1496 will address the remaining timeouts on macos experimental builds.
Problem
In the development of
nwaku
, we occasionally observe some CI/Jenkins test failures that are usually not triggered when tests are re-executed.This issue aims to collect observed test failures in order to:
Everyone is encouraged to edit this issue and add new tests that fail during checks. Once the causes of a test failure are identified and addressed, please remove the failing test from this post and comment to this issue by reporting it description, the list of identified causes and link the PR that fixed it.
Any feedback is welcome!
Test report template
Action type: CI/Jenkins OS: linux/macOS Test name/version: (v1/v2) nwaku commit/tree: Log excerpt: Causes: (not identified/speculated/identified) TimesSeen: (times the test was seen) Temporary fix/workaround: (if any)
Failing tests
Action type: CI OS: macOS Test name/version: "Messages are bridged between Waku v1 and Waku v2" (v2) nwaku commit/tree: https://github.com/status-im/nwaku/tree/7bc657b545c1b6eb54d5b70f8a926fb208aad939 Log excerpt: TimeSeen: 4
Causes: not identified
Action type: CI OS: macOS Test name/version: "event subscription" (v2) nwaku commit/tree: https://github.com/status-im/nwaku/tree/63137f3e2a9457584276823918d55065f7d8e3bf Log excerpt: TimeSeen: 1
Causes: not identified
Action type: CI OS: macOS Test name/version: "peer subscription should not be dropped if connection recovers before timeout elapses" (v2) nwaku commit/tree: https://github.com/status-im/nwaku/tree/71c5bfda0e4bfbedd967ed23e2cb53899de10c11 Log excerpt: TimeSeen: 4
Causes: not identified
Action type: CI OS: macOS Test name/version: "peer subscription should be dropped if connection fails for second time after the timeout has elapsed" (v2) nwaku commit/tree: https://github.com/waku-org/nwaku/tree/b38bf152fc50e762a3db098a49eb05f253be76a7 Log excerpt: TimeSeen: 1
Causes: not identified (root cause seems to be the same as previous failing test)
Action type: CI OS: linux Test name/version: "Admin API: connect to ad-hoc peers" (v2) nwaku commit/tree: https://github.com/waku-org/nwaku/tree/b38bf152fc50e762a3db098a49eb05f253be76a7 Log excerpt:
Causes: not identified TimesSeen: 1
Action type: CI OS: linux Test name/version: "mounting waku rln-relay: check correct registration of peers without rln-relay credentials in dynamic/on-chain mode" (v2) - happening in different tests nwaku commit/tree: https://github.com/waku-org/nwaku/tree/226b44c86d7a0abe7a4394f2a52efba77f089c90 Log excerpt:
Causes: not identified TimesSeen: 3 Temporary workaround: it seems that re-running each failing job one at a time somehow mitigates the issue (root cause might be unrelated).
Action type: CI OS: macOS Test name/version: "prunePeerStore() correctly removes peers to match max quota" v2 (experimental) nwaku commit/tree: https://github.com/waku-org/nwaku/tree/ae0a1171e605e05f03f8efeec3480bba34b62cd7/ Log excerpt:
Causes: not identified TimesSeen: 1 Temporary fix/workaround: none