vhive-serverless / vHive

vHive: Open-source framework for serverless experimentation
MIT License
279 stars 86 forks source link

"Scale from 0" and "MultipleFuncInvoke" tests often fail in vHive CRI CI #655

Open ustiugov opened 1 year ago

ustiugov commented 1 year ago

Describe the bug vHive CRI tests exhibit a sporadic (but quite frequent) failure when scaling microVMs from 0:

  1. === RUN   TestAutoscaler/Scale_from_0
    === CONT  TestAutoscaler
    cri_test.go:166: 
            Error Trace:    /root/_work/vHive/vHive/cri/cri_test.go:166
                                        /root/_work/vHive/vHive/cri/cri_test.go:99
                                        /root/_work/vHive/vHive/cri/cri_test.go:109
            Error:          Received unexpected error:
                            rpc error: code = DeadlineExceeded desc = context deadline exceeded
            Test:           TestAutoscaler
            Messages:       Failed to get response from function
    === CONT  TestAutoscaler/Scale_from_0
    testing.go:1336: test executed panic(nil) or runtime.Goexit: subtest may have called FailNow on a parent test
    --- FAIL: TestAutoscaler (268.62s)
    --- PASS: TestAutoscaler/Scale_fn_with_concurrency_1 (8.58s)
    --- FAIL: TestAutoscaler/Scale_from_0 (260.04s)
  2. === RUN   TestMultipleFuncInvoke
    cri_test.go:167: 
            Error Trace:    /root/_work/vHive/vHive/cri/cri_test.go:167
                                        /root/_work/vHive/vHive/cri/cri_test.go:132
                                        /root/_work/vHive/vHive/cri/asm_amd64.s:1571
            Error:          Received unexpected error:
                            rpc error: code = DeadlineExceeded desc = context deadline exceeded
            Test:           TestMultipleFuncInvoke
            Messages:       Failed to get response from function
    --- FAIL: TestMultipleFuncInvoke (60.04s)

Hence, we exclude these tests from the CI until a fix is found.

leokondrashov commented 1 year ago

@lrq619 Was this fixed in your last CI update?

lrq619 commented 1 year ago

@lrq619 Was this fixed in your last CI update?

No, it is not fixed, there will still be sporadic failures