Open matttrach opened 4 months ago
It has a couple of warnings there:
Ensure that the OS ISO URL field contains the URL of a VMware ISO release for RancherOS (rancheros-vmware.iso). Note that this URL must be accessible from the nodes running your Rancher server installation
@abhishekhpatil10, would you mind confirming this information from the user?
Thanks Matt, Yes the code has worked for the user in past and it worked now after second run. They saw the error only during first run. Their Rancher server cannot access that URL. It is not accessible from any node.
Sounds like this is covered in the docs then, the Rancher server installation needs to be able to access the URL. Glad to hear everything is working or them now. Please let me know if anything else is necessary.
The only question the user have is, is it mandatory to mention creation_type = "template" in the code?
I am so sorry, I misinterpreted. Here is the code defining the structure for that argument: https://github.com/rancher/terraform-provider-rancher2/blob/v4.1.0/rancher2/schema_node_template_vsphere.go#L101-L107 You can see it is set as optional with a default value.
Optional: true,
Default: vmwarevsphereConfigCreationTypeDefault,
@abhishekhpatil10 please let me know if there is anything else I can help with. If not, I will close this issue one week from now on 3/20/2024.
Any idea why we had the failure only during the first run but it has been working with every further run and the change for the template? Regarding 100% airgapped - nothing changed.
And it seems the documentation might not be correct - as it is working since the second run and this change..
well, it could be a cache miss, it could be a dropped connection, it could be filesystem io errors, it could be a random bug in vsphere, it could be a timeout in the vsphere api, it could be many things... I know companies often want in depth post mortem and RCAs, but that will need to be conducted by someone on their side with full access to the information involved
I wish I could give more, sorry!
This could be the change that they are experiencing: https://github.com/rancher/terraform-provider-rancher2/commit/ed867957635a56042cad7658f0ef1c8220a7ec46 or it could be this one: https://github.com/rancher/terraform-provider-rancher2/commit/6f97f5e3499cd9b7b73340796a7cd3e4b8d0ee9b
Both of these are non-breaking additions to that template which enable changes which occurred in the Rancher API.
In our case the problem was that the provision job created by rancher did get stuck in trying to download the iso. This should not even been tried as we are 100% air-gapped and I am not sure if the change in terraform has been causing the problem to go away or if it was something in rancher that is different on a second run.. I really believe we need QA to test 100% air-gapped scenarios properly
Terraform providers should not alter the experience of the Rancher API, this tool enables programmatic control of the API around the context of an "object" (because an object is typically the context for multiple REST endpoints). The programmatic access specifically focuses on allowing a workflow which developers find most comfortable with version control and CI/CD (thus Config As Code).
It sounds to me like the user is unhappy with how Rancher behaved given the inputs that they gave and the environment that they are in. I agree that Rancher should not attempt to look for an image and have to wait for a timeout when in airgapped situations and the image is not available. This behavior could be occurring from how Rancher is configured or from a missing feature in Rancher, or from a bug in the Rancher code (if the behavior shouldn't be happening).
I see a few solutions to this:
Neither of these solutions are changes in how this repo is written, tested, or deployed. As a maintainer of this project and other Terraform projects I can look into the module if this is something the user would like, but as I said before it would take time and effort before something like this is available.
This was discussed further in an internal issue, and the resolution that came to would be to change the default value in the provider to match the default set in Rancher. Since this is a breaking change for users who rely on the current default we are going to move this change for the next major version of the provider. This would be version 5.x of the provider.
This issue will track the progress of the change.
Rancher Server Setup
Information about the Cluster
User Information
Provider Information
Describe the bug
When generating an air-gapped downstream vSphere cluster, provisioning is stuck on downloading image from the internet. The image is specifically boot2docker from rancheros-vmware. The image is provided in boot2docker_url attribute in node template (rancher2_node_template). It seems like the value for the creation_type attribute must be "template" for this use case. The default value for this attribute is "legacy", so is creation_type now mandatory?
To Reproduce
Generate vSphere cluster using rancher2_node_template.
/home/machine/.docker/machine/cache/boot2docker.iso
Actual Result
error downloading the image, failed to provision
Expected Result
no error, provisioning works with default options