haskell-tls / hs-tls

TLS/SSL implementation in haskell
Other
402 stars 87 forks source link

`tls-2.0.1` non-deterministic test failures in GitHub Actions, with Nix #470

Open peterbecich opened 3 months ago

peterbecich commented 3 months ago

This is a niche issue involving the tls library, a Nix Flake, and GitHub Actions. I can only reproduce the issue in one project (hackage-server).

I have a Nix Flake which uses tls-2.0.1: https://github.com/haskell/hackage-server/pull/1305/files#diff-206b9ce276ab5971a2489d75eb1b12999d4bf3843b7988cbe8d687cfde61dea0R28-R29

When run by the Nix Flake in GitHub Actions, the test suite of tls-2.0.1 produces non-deterministic failures: https://github.com/haskell/hackage-server/actions/runs/8406893197/job/23021268896#step:5:1135

        last 25 log lines:
       >   To rerun use: --match "/Handshake/handshake/can handshake with TLS 1.3 PSK -> HRR/" --seed 406132138
       >
       >   test/HandshakeSpec.hs:1016:16:
       >   7) Handshake.handshake can handshake with TLS 1.3 0RTT
       >        Falsifiable (after 28 tests):
       >          CSP13 (ClientParams {clientUseMaxFragmentLength = Nothing, clientServerIdentification = ("",""), clientUseServerNameIndication = True, clientWantSessionResume = Nothing, clientShared = Shared, clientHooks = ClientHooks, clientSupported = Supported {supportedVersions = [TLS1.3,TLS1.2,TLS1.2,TLS1.3,TLS1.3,TLS1.2], supportedCiphers = [TLS_ECDHE_ECDSA_WITH_AES_256_CCM,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256,TLS_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_CCM,TLS_AES_128_CCM_SHA256,TLS_AES_128_CCM_8_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_CHACHA20_POLY1305_SHA256,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256,TLS_AES_128_CCM_SHA256], supportedCompressions = [0], supportedHashSignatures = [(HashSHA1,SignatureECDSA),(HashSHA384,SignatureRSA),(HashSHA256,SignatureECDSA),(HashIntrinsic,SignatureRSApssRSAeSHA512),(HashSHA384,SignatureECDSA),(HashSHA512,SignatureRSA),(HashSHA512,SignatureECDSA),(HashSHA256,SignatureRSA),(HashIntrinsic,SignatureEd448),(HashSHA1,SignatureRSA),(HashIntrinsic,SignatureEd25519),(HashIntrinsic,SignatureRSApssRSAeSHA384),(HashIntrinsic,SignatureRSApssRSAeSHA256)], supportedSecureRenegotiation = False, supportedClientInitiatedRenegotiation = False, supportedExtendedMainSecret = RequireEMS, supportedSession = True, supportedFallbackScsv = True, supportedEmptyPacket = True, supportedGroups = [X25519,X448,P256,P384]}, clientDebug = DebugParams, clientUseEarlyData = False},ServerParams {serverWantClientCert = False, serverCACertificates = [], serverDHEParams = Nothing, serverHooks = ServerHooks, serverShared = Shared, serverSupported = Supported {supportedVersions = [TLS1.2,TLS1.2,TLS1.3,TLS1.3,TLS1.2,TLS1.2,TLS1.2,TLS1.2,TLS1.2,TLS1.2], supportedCiphers = [TLS_ECDHE_ECDSA_WITH_AES_128_CCM_8,TLS_ECDHE_ECDSA_WITH_AES_128_CCM_8,TLS_ECDHE_ECDSA_WITH_AES_128_CCM_8,TLS_ECDHE_ECDSA_WITH_AES_256_CCM_8,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_CCM_8,TLS_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_128_CCM_8,TLS_AES_256_GCM_SHA384,TLS_AES_128_CCM_SHA256,TLS_AES_128_CCM_SHA256], supportedCompressions = [0], supportedHashSignatures = [(HashIntrinsic,SignatureRSApssRSAeSHA512),(HashSHA256,SignatureECDSA),(HashSHA512,SignatureRSA),(HashSHA1,SignatureECDSA),(HashSHA256,SignatureRSA),(HashSHA384,SignatureRSA),(HashIntrinsic,SignatureEd25519),(HashSHA512,SignatureECDSA),(HashSHA1,SignatureRSA),(HashIntrinsic,SignatureRSApssRSAeSHA256),(HashSHA384,SignatureECDSA),(HashIntrinsic,SignatureRSApssRSAeSHA384),(HashIntrinsic,SignatureEd448)], supportedSecureRenegotiation = False, supportedClientInitiatedRenegotiation = False, supportedExtendedMainSecret = RequireEMS, supportedSession = True, supportedFallbackScsv = True, supportedEmptyPacket = True, supportedGroups = [X25519,P256]}, serverDebug = DebugParams, serverEarlyDataSize = 0, serverTicketLifetime = 7200})
       >        session param should be Just
       >
       >   To rerun use: --match "/Handshake/handshake/can handshake with TLS 1.3 0RTT/" --seed 406132138
       >

When run by the Nix Flake on my own Linux box, the test suite of tls-2.0.1 succeeds. This can be verified by checking out this branch https://github.com/peterbecich/hackage-server/tree/fix-flake and running nix build; I expect it will succeed on anyone's Mac or Linux box.

I have attempted and failed to reproduce the issue by creating a new Nix Flake specifically for the tls library: https://github.com/haskell-tls/hs-tls/pull/469 It runs the same tests in GitHub Actions. However, it succeeds: https://github.com/peterbecich/hs-tls/actions/runs/8412270415/job/23032912211

Furthermore, the issue does not occur when testing tls in a plain GitHub Action without the Nix Flake.

There must be some difference between the GitHub Action Runner and my Linux box, which causes the Flake to behave differently. Do you have any ideas? Thank you

kazu-yamamoto commented 3 months ago

Does only "handshake can handshake with TLS 1.3 0RTT" fail? Or does other test fail randomly?

peterbecich commented 3 months ago

This also fails

       >   8) Handshake.handshake can handshake with TLS 1.3 0RTT -> PSK

Apparently there are more failures, but they are hidden:

     > 38 examples, 8 failures
       > Test suite spec: FAIL

I could probably make the GitHub Action print the entire log, if necessary.

kazu-yamamoto commented 3 months ago

I guess that a client gives up receiving session data from a server. In Network.TLS.Core, there are three timeout. Could you replace these timeout value with longer ones (e.g. 10,000,000) and see what happens?

peterbecich commented 3 months ago

That's a good idea. I need some time to make the change in tls and then point https://github.com/haskell/hackage-server/pull/1305/files#diff-206b9ce276ab5971a2489d75eb1b12999d4bf3843b7988cbe8d687cfde61dea0R28-R29 to it, to reproduce the issue in hackage-server.

It may be faster and simpler if I can reproduce the issue here: https://github.com/haskell-tls/hs-tls/pull/469 Will continue working on that.


This is interesting. --rebuild succeeds, as expected for the crypton-connection library:

nix build nixpkgs/haskell-updates#haskellPackages.crypton-connection --rebuild

That build succeeds.

However, --rebuild fails for tls-2.0.1:

nix build nixpkgs/haskell-updates#haskellPackages.tls_2_0_1 --rebuild         
error: derivation '/nix/store/jris4v7lvamafv88c0pfqax94i0kj29q-tls-2.0.1.drv' may not be deterministic: 
output '/nix/store/ah0f12rzh0n2km8026xqf57zsnw4zjvq-tls-2.0.1' differs

There may be more than one issue with the tls_2_0_1 Nix package, and this issue(s) may be entirely in nixpkgs.

Edit: I don't think the --rebuild failure is informative.