Closed andyuk1986 closed 2 months ago
Thanks for the PR @andyuk1986. Looking at the logs of your successful run it seems like the rosa logs
is still returning a non-zero exit code due to a 404 for the cluster, however your action passes as this exit code is swallowed by the wait
command. From the docs:
If n is not given, all currently active child processes are waited for, and the return status is zero.
My issue with using wait
for this purpose is that we now have the output of destroy.sh
and rosa logs
combined, instead of being output sequentially. I think a simpler solution is to ensure that rosa logs
always returns a 0 exit code, e.g.
rosa logs uninstall --debug -c ${CLUSTER_NAME} > "$(custom_date)_delete-cluster.log" || true
@ryanemerson the thing is that, with the simplest solution we will never get the uninstall logs as the cluster is uninstalled is not there any more, that's why the error was throwing. When the actions are sequential, then first the cluster is deleted and then we try to get the uninstall logs from it and the command complains that the cluster doesn't exist.
That's why I have made it to work in parallel so that while the cluster is uninstalling we record the logs to the file (I have added --watch there for following the logs). When the cluster is uninstalled successfully then the logs command which I had finishes with Info message not Error message - I have tried that with gh-ryan-a
cluster deletion yesterday , and got the following logs in the end:
`time=2024-05-14T00:43:00+02:00 level=debug msg=Response body follows
time=2024-05-14T00:43:00+02:00 level=debug msg={
"kind": "Error",
"id": "404",
"href": "/api/clusters_mgmt/v1/errors/404",
"code": "CLUSTERS-MGMT-404",
"reason": "Cluster '2b7r83i86c62iskvidlm3kuuro0164qd' not found",
"operation_id": "e1c0996e-e621-4308-8683-d27bb44eeacf"
}
time=2024-05-14T00:43:00+02:00 level=debug msg=Bearer token expires in 1m45.262401399s
time=2024-05-14T00:43:00+02:00 level=debug msg=Got tokens on first attempt
time=2024-05-14T00:43:00+02:00 level=debug msg=Request method is GET
time=2024-05-14T00:43:00+02:00 level=debug msg=Request URL is 'https://api.openshift.com/api/clusters_mgmt/v1/clusters/2b7r83i86c62iskvidlm3kuuro0164qd/status'
time=2024-05-14T00:43:00+02:00 level=debug msg=Request header 'Accept' is 'application/json'
time=2024-05-14T00:43:00+02:00 level=debug msg=Request header 'Authorization' is omitted
time=2024-05-14T00:43:00+02:00 level=debug msg=Request header 'User-Agent' is 'ROSACLI/1.2.23 OCM-SDK/0.1.347'
Sorry, you're right @andyuk1986, I saw that we were still getting debug log output from the rosa logs uninstall
command, but it's only the calls to AWS not the actual uninstall information.
In that case +1 to Kamesh's timeout suggestion.
I would also suggest that you start watching the logs before you call destroy.sh
to make sure nothing is lost in the unlikely event that the default tail limit is reached before rosa logs
is executed.
@ryanemerson thanks a lot for your comment. So I have updated the PR with timeout impl suggested by Kamesh, also I will start to watch logs before starting the cluster destroy process. The only thing I have just noticed that --debug
enables debug mode, but the debug logs are not saved in the file. I have checked the logs for today's cluster creation and it only contains 2 lines:
INFO: Loading cluster 'gh-keycloak-a' INFO: Cluster 'gh-keycloak-a' has been successfully installed
So need to fix that as well.
@kami619 @ryanemerson the PR is ready for review.
Closes #810
Please find the successful run with this changes here: https://github.com/andyuk1986/keycloak-benchmark/actions/runs/9070651045