matrix-org / complement

Matrix compliance test suite
Apache License 2.0
61 stars 52 forks source link

Deflake faster joins device list tests by waiting for leave event #627

Closed squahtx closed 1 year ago

squahtx commented 1 year ago

Many of the faster joins test flakes are due to the homeserver under test failing to contact Complement homeservers after they have been torn down. When this happens, subsequent tests can fail if they use a Complement homeserver that happens to have the same hostname:port as one which the homeserver under test has previously marked as offline.

Wait for the homeserver under test to finish broadcasting its leave at the end of the device list tests.

Signed-off-by: Sean Quah seanq@matrix.org


Builds on top of #626. Reviewable commit by commit.

NB: you can tease out the flakes by forcing Complement to reuse ports whenever possible:

diff --git a/internal/federation/server.go b/internal/federation/server.go
index ed7974d..4c4ccbe 100644
--- a/internal/federation/server.go
+++ b/internal/federation/server.go
@@ -455,7 +455,10 @@ func (s *Server) Listen() (cancel func()) {
        var wg sync.WaitGroup
        wg.Add(1)

-       ln, err := net.Listen("tcp", ":0") //nolint
+       ln, err := net.Listen("tcp", ":63333") //nolint
+       if err != nil {
+               ln, err = net.Listen("tcp", ":63334") //nolint
+       }
        if err != nil {
                s.t.Fatalf("ListenFederationServer: net.Listen failed: %s", err)
        }
squahtx commented 1 year ago

CI flaked due to the issue fixed by #628.

DMRobertson commented 1 year ago

(I've optimistically marked this as closing a bunch of faster join device list updates tests---hope I correctly matched up tests to this issue)