xmtp / xmtp-node-go

Software for the nodes that currently form the XMTP network
MIT License
10 stars 3 forks source link

Sporadic e2e test failures #134

Closed snormore closed 2 years ago

snormore commented 2 years ago

We're seeing some sporadic e2e test failures (metrics, logs). It seems to be happening maybe once every couple of days so far, and looks like some of the sync messages (added in https://github.com/xmtp/xmtp-node-go/pull/132) are being included in either the subscribe-expectation check or there's a mismatch in the query-expectation check.

expected equal, diff:   []*v1.Envelope(Inverse(cmpopts.SortSlices, []*v1.Envelope{
+   s`content_topic:"test-f7xll8qphqky" message:"sync-0bbb8grgvkjc"`,
+   s`content_topic:"test-f7xll8qphqky" message:"sync-0qnimxngk0yi"`,
+   s`content_topic:"test-f7xll8qphqky" message:"sync-0xiud73cfrgp"`,
    s`content_topic:"test-f7xll8qphqky" message:"sync-0ylrfffwbzba"`,
+   s`content_topic:"test-f7xll8qphqky" message:"sync-1adiquei8w72"`,
+   s`content_topic:"test-f7xll8qphqky" message:"sync-1fvpku459fin"`,
+   s`content_topic:"test-f7xll8qphqky" message:"sync-1kzn22kxn4zn"`,
    s`content_topic:"test-f7xll8qphqky" message:"sync-1olgem9sm6q8"`,
+   s`content_topic:"test-f7xll8qphqky" message:"sync-1q8hlx6nrnlh"`,
+   s`content_topic:"test-f7xll8qphqky" message:"sync-1t5yw7wu4fcw"`,
    s`content_topic:"test-f7xll8qphqky" message:"sync-1wcekucrcvn3"`,
+   s`content_topic:"test-f7xll8qphqky" message:"sync-27lxjv17s8w3"`,
+   s`content_topic:"test-f7xll8qphqky" message:"sync-27ov68rgdfmj"`,
    s`content_topic:"test-f7xll8qphqky" message:"sync-28dofsu0ms2k"`,
    s`content_topic:"test-f7xll8qphqky" message:"sync-2a154a8dg4n6"`,
+   s`content_topic:"test-f7xll8qphqky" message:"sync-2dxeu0aroexn"`,
    s`content_topic:"test-f7xll8qphqky" message:"sync-2ihsasd0hxzr"`,
+   s`content_topic:"test-f7xll8qphqky" message:"sync-2jmzprfo5enf"`,
+   s`content_topic:"test-f7xll8qphqky" message:"sync-2kecqgii9bm4"`,
+   s`content_topic:"test-f7xll8qphqky" message:"sync-2l0u25yj8z3r"`,
+   s`content_topic:"test-f7xll8qphqky" message:"sync-2pjtlpw6rtq6"`,
+   s`content_topic:"test-f7xll8qphqky" message:"sync-2s7njiotysm3"`,
+   s`content_topic:"test-f7xll8qphqky" message:"sync-37eu35b3x4ae"`,
    ... // 94 identical and 157 inserted elements
  }))

The first thing that needs to happen for this is to clean up the logging of these errors so it's clearer which of the expectation checks failed. If it's the subscribe-expectation check then we probably should re-add https://github.com/xmtp/xmtp-node-go/pull/133

Instances:

snormore commented 2 years ago

Another instance of this happened after we added better logging. It looks like it's the query expectations check, where some messages aren't stored/equal after the 3 second timeout. We can bump that timeout to 5 or 10 seconds and I think it would resolve the test failure, but it's definitely a little concerning that at times there can be that kind of delay in persisting. It does seem like a case where a whole bunch of sync messages were sent though, so probably related to large volume inserts.

snormore commented 2 years ago

It looks like increasing the query expectation timeout didn't mitigate this (logs)

snormore commented 2 years ago

I think this is a pagination thing, on top of a slow initial subscribe sync. In the latest instance, we can see there are 100 matches, and the rest missing. 100 is the default/max page size in waku.