On 4/9/2024 I had a number of failures in a row. Last one had this log i.e. the last test passed and then the esti process failed!
logs_27919722201.zip
Resource leak 3/9/2024:
Looked at this with Guy- it seems like when the job to load data fails(batch/job/load-sample-data), the while environment doesn't get cleaned up and these slowly leak and take up all the K8s resources
We manually cleaned this up, but we can expect these to keep leaking
We should add a timeout to the sample data shell script
Note: destroy-controlplane job fails sometimes:
wait.go:104: [debug] beginning wait for 33 resources to be deleted with timeout of 5m0s
uninstall.go:155: [debug] purge requested for ci-3711-2bd1b8-cloud-control-plane
Error: uninstallation completed with 1 error(s): context deadline exceeded
helm.go:84: [debug] uninstallation completed with 1 error(s): context deadline exceeded
helm.sh/helm/v3/pkg/action.(*Uninstall).Run
helm.sh/helm/v3/pkg/action/uninstall.go:163
main.newUninstallCmd.func2
helm.sh/helm/v3/cmd/helm/uninstall.go:60
github.com/spf13/cobra.(*Command).execute
github.com/spf13/cobra@v1.8.0/command.go:983
github.com/spf13/cobra.(*Command).ExecuteC
github.com/spf13/cobra@v1.8.0/command.go:1115
github.com/spf13/cobra.(*Command).Execute
github.com/spf13/cobra@v1.8.0/command.go:1039
main.main
helm.sh/helm/v3/cmd/helm/helm.go:83
runtime.main
runtime/proc.go:271
runtime.goexit
runtime/asm_amd64.s:1[69](https://github.com/treeverse/cloud-controlplane/actions/runs/10661676834/job/29555638228?pr=3711#step:5:70)5
Error: Process completed with exit code 1.
What happened?
ESTI tests are flakey and often fail developer's pipelines
log attached logs_27863969862.zip
logs_27863969862 (1).zip
On 4/9/2024 I had a number of failures in a row. Last one had this log i.e. the last test passed and then the esti process failed! logs_27919722201.zip
Resource leak 3/9/2024: Looked at this with Guy- it seems like when the job to load data fails(batch/job/load-sample-data), the while environment doesn't get cleaned up and these slowly leak and take up all the K8s resources
We manually cleaned this up, but we can expect these to keep leaking
We should add a timeout to the sample data shell script
Note: destroy-controlplane job fails sometimes:
Expected behavior
No response
lakeFS version
No response
How lakeFS is installed
No response
Affected clients
No response
Relevant log output
No response
Contact details
nadav.steindler@treeverse.com