Closed anoopcs9 closed 2 years ago
Hm.. we were already at the verge of test timeout(20m) and now with this change possibility is higher as we can see with the very first run.
=== RUN TestIntegration/reconciliation/scaleoutCluster
reconcile_test.go:142: test ID: gtxg2j
=== RUN TestIntegration/reconciliation/scaleoutCluster/TestScaleoutClusterSuite
=== RUN TestIntegration/resourceUpdate
=== RUN TestIntegration/resourceUpdate/SmbShareUpdateSuite
resource_update_test.go:78: test ID: 7v842d
=== RUN TestIntegration/resourceUpdate/SmbShareUpdateSuite/TestEditReadOnly
resource_update_test.go:149: checking smbclient login to share
resource_update_test.go:186: Setting readonly=true for SmbShare samba-operator-system/tshare1-7v842d
resource_update_test.go:227: checking smbclient write to share
resource_update_test.go:186: Setting readonly=false for SmbShare samba-operator-system/tshare1-7v842d
panic: test timed out after 20m0s
Hm.. we were already at the verge of test timeout(20m) and now with this change possibility is higher as we can see with the very first run.
On thursday I was saying that I noticed the non-clustered runs were taking a while when I had some networking issues the suite was timing out. I have a change to increase both by 10 mins (to 20 minutes and 30 minutes). e038c17159b785f6ddec89d8aba6aadb17f7f4a4 is clearly not yet ready for prime time, but feel free to take and adapt it for this PR if you want to.
Hm.. we were already at the verge of test timeout(20m) and now with this change possibility is higher as we can see with the very first run.
On thursday I was saying that I noticed the non-clustered runs were taking a while when I had some networking issues the suite was timing out. I have a change to increase both by 10 mins (to 20 minutes and 30 minutes). e038c17 is clearly not yet ready for prime time, but feel free to take and adapt it for this PR if you want to.
I added the change for increasing timeout for clustered test runs from 20m to 30m.
I guess the test already had that flaw though. Maybe what we really want is (pseudocode):
updateSmbShare(); ctx2 := contextWithTimeout() poll(ctx, func() { l, err := StatefulSets.List(...) checkStatefulSet(l) }) require.NoError(waitForPodExist(ctx, s), "smb server pod does not exist")
What do you think?
Ok, sounds reasonable.
Networking issues on infra where CentOS CI is hosted: https://lists.centos.org/pipermail/ci-users/2022-August/004605.html
Rook setup is most likely unsuccessful due to the above outage. But here is a new make check
failure from golangci-lint:
internal/resources/metrics.go:14:2: could not import k8s.io/apimachinery/pkg/types (-: could not load export
data: cannot import "k8s.io/apimachinery/pkg/types" (unstable iexport format version 2, just rebuild compiler
and std library), export data is newer version - update tool) (typecheck)
"k8s.io/apimachinery/pkg/types"
^
Wow, that's quite the error. Oddly, that's not what I see when I look in the CI logs. Rather, I see:
go: downloading google.golang.org/protobuf v1.27.1
go: downloading github.com/matttproud/golang_protobuf_extensions v1.0.1
golangci-lint installed in /home/runner/work/samba-operator/samba-operator/.bin
/home/runner/work/samba-operator/samba-operator/.bin/golangci-lint -c .golangci.yaml run ./...
! gofmt -e -s -l . | sed 's,^,formatting error: ,' | grep 'go$'
make: *** [Makefile:214: check-format] Error 1
formatting error: tests/integration/reconcile_test.go
Error: Process completed with exit code 2.
I looked at the patch, and the CI is "right" there are formatting issues in your patch (spacing in the struct fields in the poll.Prober. I'm wondering if the other error is a local issue on your setup with that linter?
I looked at the patch, and the CI is "right" there are formatting issues in your patch (spacing in the struct fields in the poll.Prober.
Running gofmt
with -d
reported the following diff:
--- tests/integration/reconcile_test.go.orig 2022-08-30 12:49:55.592909058 +0530
+++ tests/integration/reconcile_test.go 2022-08-30 12:49:55.592909058 +0530
@@ -202,7 +202,7 @@
ctx, smbShare)
require.NoError(err)
- ctx2, cancel := context.WithTimeout(s.defaultContext(), 3 * time.Second)
+ ctx2, cancel := context.WithTimeout(s.defaultContext(), 3*time.Second)
defer cancel()
s.Require().NoError(poll.TryUntil(ctx2, &poll.Prober{
RetryInterval: time.Second,
-d
, if present with older versions, can better pin point the formatting error.
I'm wondering if the other error is a local issue on your setup with that linter?
internal/resources/metrics.go:14:2: could not import k8s.io/apimachinery/pkg/types (-: could not load export data: cannot import "k8s.io/apimachinery/pkg/types" (unstable iexport format version 2, just rebuild compiler and std library), export data is newer version - update tool) (typecheck) "k8s.io/apimachinery/pkg/types" ^
I could see few issues reported online and it has to do with the installed versions of Go and
golanci-lint
. I encountered above error on a system with following versions installed:
# go version
go version go1.18.4 linux/amd64
# .bin/golangci-lint --version
golangci-lint has version v1.43.0 built from (unknown, mod sum: "h1:SLwZFEmDgopqZpfP495zCtV9REUf551JJlJ51Ql7NZA=") on (unknown)
Our install script restrict golanci-lint
at v1.43.0 and GitHub CI runs on Go 1.16. In order to get rid of this new error I did an update to latest golangci-lint
.
GitHub CI runs on Go 1.16
Oh, we should definitely update that at some point soonish.
/test centos-ci/sink-clustered/mini-k8s-1.24
LGTM. I was fine with it being a separate commit rather than a full blown seperate PR, but either way this now looks fine AFAIAC.
Ah..my bad !
TestScaleoutClusterSuite
fails more frequently with the following error:Above check is to make sure that number of replicas within StatefulSet reflects the updated SmbShare.Spec.Scaling.MinClusterSize. But an immediate check on StatefulSet.Spec.Replicas might not always give us the desired(updated) value.
Therefore we retry this check within a brief 3 seconds timeout on account of any delay in field update. In addition, we at least wait for the existence of extra pods corresponding to updated replica count. Considering the increased overall test time we further raise the timeout from 20m to 30m.