Open dataclouder opened 5 years ago
How to reproduce: Using an empty vCD, and having the above script filled with the right credentials, run this script:
#!/bin/bash
function check_exit_code {
exit_code=$?
if [ "$exit_code" != "0" ]
then
echo "ERROR"
exit $exit_code
fi
}
for N in $(seq 1 10)
do
echo "# $N - init"
terraform init
check_exit_code
echo "# $N - plan"
terraform plan
check_exit_code
echo "# $N - apply"
terraform apply -auto-approve
check_exit_code
echo "# $N - destroy"
terraform destroy -auto-approve
check_exit_code
rm -f terraform.tfstate*
done
Repeat until you see the error
We figure out and made change to fail when response is lost: https://github.com/vmware/go-vcloud-director/pull/231
@dataclouder I think it is safe to close?
No. We made a change that makes the task fail explicitly when previously it was just hanging, but the core issue of this problem is still bothering us. Until we find and fix the root cause, this issue should stay open
This is a bug difficult to reproduce, which seems to be a race condition. Using a script generated by full-env.tf, the catalog item upload, which usually completes in less than 3 minutes, sometimes continues indefinitely and needs to be stopped manually.
The script creates the following:
There are two operations that take most of the time and go on in parallel: the edge gateway generation and the upload of the catalog item.
Example script:
A successful operation looks like this:
When it fails, there is no conclusion. You will see lots of lines like
and if you check in the GUI, you'll see that the catalog item has been uploaded and can be used successfully to create vApps.