Ability to mount additional volumes

Goend commented 2 years ago

/kind feature

Describe the solution you'd like openstack template only has rootvolume, but i want to my own volume

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

mdbooth commented 2 years ago

Is there any reason you wouldn't just use a bigger root volume?

Goend commented 2 years ago

Because etcd has high requirements on the performance of the disk, on a disk, I am worried that other programs will affect it.And in our environment, we will give the etcd high performance disk, but the system disk cannot use high performance disk

mdbooth commented 2 years ago

Is this something you've tested? Did it fix the etcd problem? How would you configure the install image to mount the disk? What volume configuration did you use for improved performance?

This would be quite an invasive change. I can see the potential, but I'd want to be sure it's fixing a real problem before even thinking about the design.

Goend commented 2 years ago

This is a feature so I haven't tested it. The etcd community document recommends the use of ssd disk. We only need to support user disk mounting. As for how users use it, I think users can add partitions and initialization to the disk in the cloud.init file.

jichenjc commented 2 years ago

I knew the ETCD requirement .e.g SSD (OCP seems also has that)

basically I think we need 2 things 1) volume attach including other lifecycle actions 2) how to make the attach disk become k8s use

does cluster-api has #2 already e.g kubadm setup?

Goend commented 2 years ago

I think step 2 is too hard to achieve, so it should be implemented by users themselves

jichenjc commented 2 years ago

I think step 2 is too hard to achieve, so it should be implemented by users themselves

little bit confused, you are saying etcd ,correct? should this be fit into k8s setup flow? CAPI/CAPO fit for k8s setup and I think etcd is part of that lifecycle? curious how user able to setup etcd to use cinder volumes when the k8s is up running?

Goend commented 2 years ago

Since we don't know how the user wants to init the disk and how many disk, we should probably be more general and not just consider etcd, since I don't know much about this, you can ignore it

jichenjc commented 2 years ago

ok, again, I think the add volume still have value for some potential use cases, e.g CSI local storage might be helpful , we create /dev/vdb so that the local storage operator might consume it ..

jichenjc commented 2 years ago

I used the volume as source for local persistent storage source, this might be helpful (so that we don't need CSI in some cases..) this might be a use case

~# nova volume-attach 22964855-c4d0-4c06-8772-7f87b8d8a400 6cff6873-f224-4472-a03d-1d52f8d8b209
+-----------------------+--------------------------------------+
| Property              | Value                                |
+-----------------------+--------------------------------------+
| delete_on_termination | False                                |
| device                | /dev/vdc                             |
.....

root@jitest40:~# kubectl --kubeconfig=./capi-quickstart.kubeconfig get pvc
NAME             STATUS   VOLUME                  CAPACITY   ACCESS MODES   STORAGECLASS    AGE
local-pvc-name   Bound    example-pv-filesystem   1Gi        RWO            local-storage   19m
root@jitest40:~# kubectl --kubeconfig=./capi-quickstart.kubeconfig get pv
NAME                     CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM                    STORAGECLASS    REASON   AGE
example-pv-filesystem    1Gi        RWO            Delete           Bound       default/local-pvc-name   local-storage            23m

root@jitest40:~# cat pod.yaml
apiVersion: v1
.....
    volumeMounts:
    - name: localpvc
      mountPath: /data
  volumes:
  - name: localpvc
    persistentVolumeClaim:
      claimName: local-pvc-name
  dnsPolicy: ClusterFirst
  restartPolicy: Always
status: {}
root@jitest40:~# kubectl --kubeconfig=./capi-quickstart.kubeconfig describe pv example-pv-filesystem
.....
Node Affinity:
  Required Terms:
    Term 0:        kubernetes.io/hostname in [capi-quickstart-md-0-748qj]
Message:
Source:
    Type:  LocalVolume (a persistent volume backed by local storage on a node)
    Path:  /dev/vdc

Goend commented 2 years ago

Here are some ideas for etcd data disks in capi https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20200423-etcd-data-disk.md

jichenjc commented 2 years ago

thanks for the sharing , I read the proposal and seems CAPI does support the infra already so maybe from CAPO side we need honor those and provide related support for such use case..thanks :)

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

mdbooth commented 1 year ago

/retitle Ability to mount additional volumes

mdbooth commented 1 year ago

@mnaser

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 1 year ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes-sigs/cluster-api-provider-openstack/issues/1286#issuecomment-1365056397): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

javiku commented 1 year ago

As @jichenjc mentioned earlier, I believe this feature is necessary to be able to use Storage Providers that manage full disks. In our clusters setups we use several OpenEBS storage providers that need to manage their own disks, so it becomes necessary to be able to attach some volumes when creating the machines. This should definitely be supported.

Once the disk is attached at Openstack level, I believe the user can be resposible to doing whatever is necessary for the disk to be usable (via could-init commands).

We are currently blocked by this issue, since we have no way to provide and attach volumes with CAPO.

Could we reopen this?

jichenjc commented 1 year ago

Reopened just now , I personally don't have too much time on this issue recently I think it should be a change that need API spec update but we have various samples already ..

@javiku not sure you have time to work on this ,FYI

mdbooth commented 1 year ago

I'm really interested in this feature, and if nobody else does it I suspect I will get to this feature 'eventually'. That's probably more than 12 months down the line, though.

I would like to add a new controller for volumes. Volume creation and deletion in the machine controller is already bugged and way too complex. I want to move it out to a separate CRD with a separate controller. In this way, the machine controller can simply create one of these and wait for it to be ready before attaching it. Deletion can also be handled much more robustly without races.

With this in place, it becomes a much simpler matter to specify several volumes without exacerbating the complexity we already have with root volumes. Problems which would still remain:

Defining an API
How would these interact with failure domains if failure domains used volume types?
How would we communicate to the guest OS how they should be mounted?

dulek commented 1 year ago

I like the separate controller idea.

seanschneeweiss commented 1 year ago

Also surprised that CAPI already has support for this. That is how CAPZ moved etcd to a different data disk: https://github.com/kubernetes-sigs/cluster-api-provider-azure/pull/661

mdbooth commented 1 year ago

/remove-lifecycle rotten

mkjpryor commented 1 year ago

@mdbooth

We also have a use case for this where customers only have a small SSD pool and want to use that only for etcd, not for other stuff e.g. Docker images.

This means that whatever the solution is we would need to make sure the volume type can be specified.

mdbooth commented 1 year ago

Ping @EmilienM

mkjpryor commented 1 year ago

@mdbooth

FWIW I think I prefer the DataDisks on the machine resource where the volumes take the machine name plus a suffix, like what the Azure provider implemented, rather than another CRD. Unless there is a very good reason for implementing the additional complexity?

Is anyone actively looking at this? We might have a customer who needs this relatively soon and may be able to commit some resource to a patch - no commitment though! 😉

mdbooth commented 1 year ago

@mkjpryor @EmilienM is looking at a related aspect of this, which is why I pinged him. There could well be an opportunity to work together on this.

The related aspect is using the local disk as a data disk and booting from volume, specifically for etcd. From Nova's POV you boot a flavor with e.g. a 5G ephemeral disk. You specify a block device mapping which boots from a separately-created root volume and maps the local ephemeral disk second.

mkjpryor commented 1 year ago

@mdbooth

Thanks for the clarification - that sounds quirky! I can see why that wouldn’t be so easily supported by a DataDisks option.

Another related thing that we are keen to look at is making better use of the ephemeral disk of flavors (as opposed to the root disk). I think all the primitives already exist to do this though - it’s just a case of deciding where to mount it.

EmilienM commented 1 year ago

@mkjpryor Yes I'm going to work on that, very likely in the next months.

I'm sad that I didn't came across this issue before I kicked off this small prototype (will be re-worked): https://github.com/kubernetes-sigs/cluster-api-provider-openstack/pull/1640

I personally think that we should follow https://github.com/kubernetes-sigs/cluster-api-provider-azure/pull/661 as it's a strong reference and following https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20200423-etcd-data-disk.md.

Now for the controller separation question, I don't have enough context now to provide an opinion. I'll spike on this in the next weeks.

EmilienM commented 1 year ago

/assign EmilienM

mkjpryor commented 1 year ago

Been starting to think about what the implementation for this looks like. The main thing I am pondering is this question:

Is there any reason to store volume IDs in the machine status for the provisioned volumes?

To me, storing volume IDs would just give us another place where volumes can be leaked, i.e. if we create a volume but the writing of the status fails. So we still need to check by name (or something else, see below) on delete and on a retried reconciliation anyway. We can pretty much rely on the machines to have a random component in their names as well, so we shouldn’t encounter the problem of two machines with the same name very often.

My preferred solution is to put the Kubernetes UUID of the OpenStackMachine in the volume metadata so we can search by that later to identify which volumes belong to a particular machine. @mdbooth and I have spoken in the past about doing a similar thing with Neutron resources using a Neutron tag with the UUID of the OpenStack{Cluster,Machine} object, which I also still think is a good idea.

If we do that, I’m not sure there is any value to us in storing the volume IDs.

So the volume reconciliation logic is something like:

Find all volumes with the OpenStackMachine UUID in their metadata
Create any missing ones with the OpenStackMachine UUID and “volume purpose” in metadata
Pass the volume IDs of the discovered/created volumes to instance create (but no need to store them)

I am assuming that the machine spec is pretty much immutable, as the pattern in CAPI is to replace rather than modify in place - that means there is no need to check for volumes that exist but are no longer in the spec. In fact, we should make sure that the volumes associated with a machine are not permitted to change.

Then the volume delete reconciliation is literally just:

Find all the volumes with the OpenStackMachine UUID in their metadata
Delete them all

Does that sound sensible to people?

mdbooth commented 1 year ago

You're absolutely right that:

There is no requirement to store volume ids in the status
Even if we did store volume ids in the status, we would still have to check for them by other means on delete.

Storing state in the status would essentially be an optimisation because if the volumes are present in the status there's no need to check cinder at all during reconcile, and if we do check cinder it will be by volume id instead of name/tag, which is more efficient. As an optimisation, though, we don't need to implement it.

The most important things are:

Volume creation has an independent flow which the controller must call before server creation
Volume deletion has an independent flow which the controller must call after server deletion

I agree we don't need to consider the deletion of volumes from the spec.

I very much like the idea of tagging volumes. I'd probably tag them with both the machine and the cluster.

mkjpryor commented 1 year ago

Agreed. We can optimise later if we feel we need it. I think this keeps the patch nice and simple.

mkjpryor commented 1 year ago

Also, I managed to get /dev/disk/openstack/by-tag/<tag> working. Because of the limitations of the udev sandbox, it requires the config drive to be enabled.

This is the PR that applies the changes to our image builder - https://github.com/stackhpc/azimuth-images/pull/64.

Not sure what the longer-term solution is for this code - maybe integrating into https://github.com/kubernetes-sigs/image-builder?

kubernetes-sigs / cluster-api-provider-openstack

Ability to mount additional volumes #1286