Closed NoOverflow closed 5 months ago
Comparing the Kops request vs Terraform's with Fiddler, I noticed a field "os:scheduler_hints" that was not present in Terraform requests.
Removing this field by using breakpoints allowed me to provision all machines with no problems. I assume this flag is set by Kops to put the instance in a specific instance group ?
Anyway, I think the issue is mostly coming from an Openstack misconfiguration on my end, but is there a way for me to remove this flag using a command line argument so I can use Kops while I figure out what's wrong ?
Ok, I figured out why provisioning VMs from Horizon directly wasn't increasing my local hypervisor storage.
By default Kops and Terraform will create the root volume from the image directly, and store it on the local storage:
{{
"server": {
"availability_zone": "nova",
"config_drive": false,
"flavorRef": "07ed0b5e-1daa-47b8-882e-c2e2d1a6cdac",
"imageRef": "e2330d75-9703-492d-89f8-0d2b186cb959",
"key_name": "test-keypair",
"name": "basic",
"networks": [
{
"uuid": "ae7f60ff-6e97-4e1a-93e7-97807737f63f"
}
],
"security_groups": [
{
"name": "default"
}
],
"user_data": ""
}
}
On the other hand, Horizon will create a new volume based on the image UUID, and then mount it on the VM, therefore not using the precious hypervisor local storage: (note the block_device_mapping_v2 field)
{
"availability_zone": "nova",
"config_drive": false,
"user_data": "",
"default_user_data": "",
"disk_config": "AUTO",
"instance_count": 1,
"name": "test5",
"scheduler_hints": {},
"security_groups": [
"691575f1-84bc-4081aea7b6ebe01980e2"
],
"create_volume_default": true,
"hide_create_volume": false,
"source_id": null,
"block_device_mapping_v2": [
{
"source_type": "image",
"destination_type": "volume",
"delete_on_termination": false,
"uuid": "e2330d75-9703-492d-89f8-0d2b186cb959",
"boot_index": "0",
"volume_size": 20
}
],
"flavor_id": "07ed0b5e-1daa-47b8-882e-c2e2d1a6cdac",
"nics": [
{
"net-id": "ae7f60ff-6e97-4e1a-93e7-97807737f63f",
"v4-fixed-ip": ""
}
],
"key_name": "test-keypair"
}
Therefore the issue is not coming from Kops directly, but from my cluster instances configuration. Just need to figure out from the documentation how to get Kops to create instances with a volume mapping on Openstack (did it once on AWS so it shouldn't be that different)
Here's what a Terraform script that allows me to use a block device served by cinder as a root volume looks like;
resource "openstack_compute_instance_v2" "test-server" {
availability_zone = "nova"
name = "basic"
flavor_id = "07ed0b5e-1daa-47b8-882e-c2e2d1a6cdac"
key_pair = "test-keypair"
security_groups = ["default"]
block_device {
uuid = "e2330d75-9703-492d-89f8-0d2b186cb959" // My Image UUID (debian)
source_type = "image"
destination_type = "volume"
boot_index = 0
volume_size = 20
delete_on_termination = true
}
network {
name = "test"
}
count = 2
}
I can't seem to find a way to modify InstanceRootVolumeSpec to achieve the same thing, is it even possible in Kops or should I open a feature request ?
/kind support
(Moving this to tag kind/support as it's not a bug but a question, don't know how to remove the kind/bug tag)
@zetaab Any ideas?
you need to define annotations to instancegroups https://github.com/kubernetes/kops/blob/a913d3c0dba757653761ee8d2f0b16bedab0d34a/pkg/model/openstackmodel/servergroup.go#L119-L125
for instance
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
annotations:
openstack.kops.io/osVolumeBoot: "true"
openstack.kops.io/osVolumeSize: "10"
name: nodes
spec:
...
Hi, thanks for the response !
Unfortunately I had already tried that, but I may have an idea why it still uses local hypervisor storage,
when sniffing the request Kops makes with Fiddler, while it did provision a boot volume on cinder correctly, I noticed that it still sent the imageRef
argument which seems to confuse Openstack.
Trying to replicate this, it seems that Terraform does not send the imageRef field even if set manually:
Which creates the instances with no problem and without using the hypervisor local storage:
this feature was created in https://github.com/kubernetes/kops/pull/7652 However, I have never used it because I have no need for that. I cannot say is it working correctly or not
I mean, the feature does work correctly, it provision a volume and boots from it but still uses another volume on the local storage (probably due to that imageRef field). A quick temporary fix would be to try and remove the imageRef field manually to match Terraform's behavior and see if it works. I can try that on my end
Removing the imageRef field using this fiddler script:
if (oSession.HTTPMethodIs("POST") && oSession.PathAndQuery.StartsWith("/v2.1/servers") && oSession.HostnameIs("192.168.1.190:8774")) {
FiddlerObject.log("Detecting server creation request, setting imageRef to empty...");
oSession.utilDecodeRequest();
var requestBody = System.Text.Encoding.UTF8.GetString(oSession.requestBodyBytes);
if (!requestBody.Contains("bastions")) {
requestBody = requestBody.Replace("\"imageRef\":\"e2330d75-9703-492d-89f8-0d2b186cb959\"", "\"imageRef\":\"\"");
FiddlerObject.log(requestBody);
oSession.utilSetRequestBody(requestBody);
}
}
Kops managed to create all VMs without problem and without using the hypervisor local storage.
I think kops default behaviour should be that if the openstack.kops.io/osVolumeBoot
is set to true, the imageRef
field should be set to blank (for Openstack, that is).
Would be more than happy to dig in the Openstack API documentation to confirm that this behaviour is intended, and open a PR to fix it in Kops. Even though I'm not really familiar with Go, it should only be an additional condition around here in instance.go
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/reopen
/remove-lifecycle rotten
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
/kind bug
1. What
kops
version are you running? The commandkops version
, will display this information.Client version: 1.27.0 (git-v1.27.0)
2. What Kubernetes version are you running?
kubectl version
will print the version if a cluster is running or provide the Kubernetes version specified as akops
flag.Client Version: v1.28.0 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
3. What cloud provider are you using?
Openstack 2023.1 (kolla-ansible)
4. What commands did you run? What is the simplest way to reproduce this issue?
5. What happened after the commands executed?
Hello, I am running an Openstack AIO with Cinder LVM for volume backend as the default storage for volumes.
When deploying a cluster, I encounter this error in the last steps (creation of VMs)
I searched for the root of the issue and figured out that Openstack refuses to schedule the instance as its running low on disk space after creating a few instances.
I can create tens of instances using Terraform or the Horizon WebUI but not using Kops because for some reason, it tries to provision VMs using the hypervisor local storage (which is limited to a few 10s of GBs as its not really meant to be used for my use-case) and not my Cinder LVM storage.
My guess is that I'm probably missing an argument to change this behaviour. This argument looks promising but there is no documentation about it that I can find.
Could also be related to my Openstack setup.
6. What did you expect to happen?
Kops not to use the hypervisor local storage for VM data.
7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml
to display your cluster manifest. You may want to remove your cluster name and other sensitive information.8. Please run the commands with most verbose logging by adding the
-v 10
flag. Paste the logs into this report, or in a gist and provide the gist link here.log with -v 10
9. Anything else do we need to know?
I'll try to check how Kops creates VMs on Openstack tomorrow and compare it to how they are created on Terraform / Horizon. I think it'll help.
Thanks !