hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.92k stars 1.95k forks source link

Feature request: "volume" or "host_volume" to mount subdirectory of volume #6536

Open gabriel-v opened 5 years ago

gabriel-v commented 5 years ago

Neither volume nor volume_mount support mounting a subdirectory from inside the volume, just the volume root.

I would avoid restarting Nomad clients for host volume configuration changes, since the project I'm working on requires dynamic adding and removing of volumes. I would like to create only one host volume and mount dynamically-named subdirectories from it. This is not possible (or not documented) in the 0.10.0 version, the only option available being to mount the whole volume.

f3l1x commented 4 years ago

I would appreciate it too.

4BitBen commented 4 years ago

I would like to have something similar. I would like to define a single host_volume in the client stanza. Then use that in the volume stanza for a group and specify a sub-dir and be able to use the nomad variables.

I have a job that creates thousands of tasks in a group. Each task needs its own unique sub-dir volume to be mounted into the same location in the container in order for each instance to write its stateful workload to.

e.g.

// Client hcl
  host_volume "my-volume" {
    path = "/var/lib/my-volume"
  }
// job hcl
job "docs" {
  group "example" {
    count = 1000
    volume "example" {
      type = "host"
      source = "my-volume"
      **sub_dir = "docs/${NOMAD_ALLOC_INDEX}**
    }

    task "example" {
      volume_mount {
        volume      = "example"
        destination = "/var/lib/example/.state"
      }
    }
  }
}

Each instance of "example" task would write out the same file - example-cfg.json - in /var/lib/example/.state/example-cfg.json inside the container.

On the host file system, you would see:

/var/lib/my-volume/docs/0/example-cfg.json
/var/lib/my-volume/docs/1/example-cfg.json
/var/lib/my-volume/docs/999/example-cfg.json

I hope I am able to describe my use case properly. If mine should be in a separate issue, I can create a separate issue, but this issue created by @gabriel-v seemed similar.

ashishk1996 commented 4 years ago

I am not sure if this is relevant, but please do comment whatever you think. I feel Volume itself should be schedulable on any client(at least the read only Volume), irrespective of the client config "having" that volume. Restarting client seems really like a bottleneck.

tgross commented 4 years ago

I feel Volume itself should be schedulable on any client(at least the read only Volume), irrespective of the client config "having" that volume. Restarting client seems really like a bottleneck.

That's part of what CSI is intended to help with. I have had some thoughts kicking around about "dynamic host volumes" but that's not on the near-term roadmap at the moment.

benvanstaveren commented 4 years ago

CSI suffers the same problem since there is currently no way to dynamically list volumes from your cloud provider and have them available, or to dynamically provision volumes if they don't happen to exist yet - I have more or less the same use case as @4BitBen where even with CSI, there is still no way for a volume mount to specify a subdirectory of a mount (wether host or CSI) using variable interpolation.

tgross commented 4 years ago

Some more discussion and requests for this feature in https://github.com/hashicorp/nomad/issues/7110 and https://github.com/hashicorp/nomad/issues/7877

emhohensee commented 4 years ago

I would love to see this as well. My use case is running many wordpress sites across my nomad cluster, with each nomad client having one or more nfs shares mounted into, say, /opt/sites/<file_cluster_id> on the host. Each wordpress site gets it's own job file that mounts a volume like /opt/sites/fs1/<site_id> where site_id is a subdirectory of a host_volume created in the client config.

ex.

// client.hcl
client {
  host_volume "file-server-1" {
    path = "/opt/sites/fs1"
    read_only = false
  }
}
// job-site-1001.hcl
job "site-1001" {
  group "wordpress" {
    volume "fs" {
      type = "host"
      read_only = false
      source = "file-server-1"
    }

    task "web" {
      volume_mount {
        volume = "fs"
        // mounting full path /opt/sites/fs1/1001 here
        subdir = "/1001"
        destination = "/var/www/html"
      }
    }
  }
}

Without this, I would need to declare hundreds (potentially thousands) of host_volume entries on each of potentially dozens of nomad clients. Every nomad clients would also need to be restarted every time a new site was created. This workflow is a non-starter without being able to access subdirectories.

tylermenezes commented 4 years ago

Without this, I would need to declare hundreds (potentially thousands) of host_volume entries on each of potentially dozens of nomad clients.

If you're using Docker, you can actually do this with volumes in the config section of docker. (Don't forget to turn on docker.volumes.enabled in the server settings.)

You can still control host affinity if needed manually.

Kind of hacky but it will work.

mister2d commented 4 years ago

docker.volumes.enabled

What is the config syntax for this? The documentation is not clear.

tgross commented 4 years ago

The docker.volume.enabled syntax is the older HCL syntax that we're encouraging people to move away from. From the example in https://www.nomadproject.io/docs/drivers/docker#client-requirements it should be:

plugin "docker" {
  config {
    volumes {
      enabled = true
    }
}
makidoll commented 4 years ago

I've been using the docker volumes solution but I'd like to have sub directories as well.

kriansa commented 3 years ago

@tgross I noticed this issue has been removed from the triage board. Does it mean it isn't going to be prioritized?

Thanks!

tgross commented 3 years ago

Hi @kriansa. For reasons that are silly and internal to the team, we have a bunch of different project boards and not all of them are public. We want to make the Nomad roadmap public but there's a bit of process work we need to do before we can do that. This issue got moved to an internal project board which makes its status invisible to the community. Sorry about that. I can't really speak to when this will land, except to say that it's not scheduled for the upcoming 1.1.0 release.

jaen commented 3 years ago

Is there even a workaround for that, or is CSI currently useless if you need a subpath?

I've tried raw_exec sidecar with mount_volume wanting to put the volumes somewhere I can mount them into docker with volumes — but apparently CSI doesn't work with raw_exec — and I also tried similar with a docker sidecar trying to make --bind mounts into the alloc directory – but that doesn't work either, apparently --bind mounts don't propagate outside of the container.

Is there anything I could do to emulate this feature, or do I have to wait for whenever it is implemented?

tgross commented 3 years ago

Is there even a workaround for that, or is CSI currently useless if you need a subpath?

Unfortunately not. I took a crack at a workaround and even trying something very gross (😀) I wasn't able to come up with one. At the end of the day, CSI works at the level of mounts and not file system paths. It's likely that we'd be able to implement subpaths for host volumes before we'd ever be able to do so for CSI.

What I tried was to have the CSI volume owned by a "pause" container and then bind-mount that from the source in the plugin directory on the host to the target in the application container.

To demonstrate this failed workaround, we'll need a volume with a subpath in it. I set up the hostpath CSI plugin and created volume as per our demo except with the following redis.nomad file:

jobspec ```hcl job "example" { datacenters = ["dc1"] group "cache" { volume "volume0" { type = "csi" source = "test-volume" attachment_mode = "file-system" access_mode = "single-node-writer" read_only = false per_alloc = true } task "redis" { driver = "docker" config { image = "redis:3.2" } volume_mount { volume = "volume0" destination = "${NOMAD_ALLOC_DIR}/volume0" } resources { cpu = 500 memory = 256 } } } } ```

After running that I make a subpath directory in the volume with a file in it:

$ nomad alloc exec 02b mkdir /alloc/volume0/subpath
$ nomad alloc exec 02b touch /alloc/volume0/subpath/helloworld

I'll stop that job and verify the volume claim is released, and then run the following job. Note that the very long source path for the redis task is the path that the CSI plugin uses to "stage" volumes. We need the init container to own the volume because otherwise the CSI plugin won't have created this directory for us to mount (and if we were using a more production-ready CSI plugin the volume would not have been mounted to the host).

jobspec ```hcl job "example" { datacenters = ["dc1"] group "cache" { volume "volume0" { type = "csi" source = "test-volume" attachment_mode = "file-system" access_mode = "single-node-writer" read_only = false per_alloc = true } task "init" { driver = "docker" lifecycle { hook = "prestart" sidecar = true } config { image = "gcr.io/google_containers/pause-amd64:3.1" command = "pause" } volume_mount { volume = "volume0" destination = "${NOMAD_ALLOC_DIR}/volume0" } resources { cpu = 10 memory = 10 } } task "redis" { driver = "docker" config { image = "redis:3.2" mount { type = "bind" source = "/var/nomad/client/csi/monolith/hostpath-plugin0/per-alloc/${NOMAD_ALLOC_ID}/test-volume[0]/rw-file-system-single-node-writer/subpath" target = "${NOMAD_TASK_DIR}/target" } } resources { cpu = 500 memory = 256 } } } } ```

But when we run that job, we see that the subpath is not available at that path:

2021-06-17T13:14:58Z Driver Failure failed to create container: API error (400): invalid mount config for type "bind": bind source path does not exist: /var/nomad/client/csi/monolith/hostpath-plugin0/per-alloc/9d35b19a-8fdf-dde6-f987-2edb3579f576/test-volume[0]/rw-file-system-single-node-writer/subpath

And that's because the rw-file-system-single-node-writer is itself an overlay mount. If we check the mounts on the host:

$ mount | grep volume
overlay on /var/nomad/client/csi/monolith/hostpath-plugin0/per-alloc/981dabdb-23da-5c4f-1519-f3f39c19e830/test-volume[0]/rw-file-system-single-node-writer type overlay (rw,relatime,lowerdir=/var/lib/docker/overlay2/l/PLKKBUII2JZZYYD2L5QKSKQNKH:/var/lib/docker/overlay2/l/RJHPNMMGTIMTBRI3LR6PBJHPG4:/var/lib/docker/overlay2/l/Z7PNZTJWCUH7I5TKMU2WT5VK6O:/var/lib/docker/overlay2/l/2K5K234FFGF3MH52S73AMDTGD4,upperdir=/var/lib/docker/overlay2/25683cb94ca4d32d0d99b886e1d6ccfe4a98834f4a26b72be78e444d50bcad07/diff,workdir=/var/lib/docker/overlay2/25683cb94ca4d32d0d99b886e1d6ccfe4a98834f4a26b72be78e444d50bcad07/work)

How CSI plugins staging their volumes isn't actually defined by the spec (and Nomad can't have an opinion on it because it's owned by the plugins). So it's possible this failed workaround might work for whatever plugin you're using, but that'd be up to you to explore. Note that the mount path isn't part of our public API and it's possible it might change.

jaen commented 3 years ago

Huh, what do you know – that actually worked!

I've been trying something similar except by bind-mounting the CSI volume as seen inside what you called the init task into the task's /alloc directory and then hoping to mount that using volumes in the other container – but the mount points came up empty outside the init container, resulting in similar failure to what you describe (probably because like you say binds and overlays don't escape the container).

But it turns out in my case that additional indirection was the reason for it not working – I was trying to use democratic-csi to mount NFS volumes, so at the CSI driver staging area path you suggested using they are actually normal NFS mounts, so when I put them in Docker's volumes it Just Works™. Well, almost – there seems to be some kind of a race condition when unmounting that can make the immediate next allocation fail to deploy, but it's something.

It's certainly not the cleanest solutions, but I guess it works well enough for my use-case – making a CI builder on M1 mac, where you can't just mount things at /srv willy-nilly – and I can keep using normal system mounts or more sane OSes for now.

That said, I do hope it will come soon as a proper feature, as relying on implementation details such as this magic path is making me queasy.

Thanks again for suggesting to use that path to the CSI plugin staging area, I probably wouldn't have figured out that part myself.

SacDin commented 3 years ago

Eagerly waiting for an update on it. We just need a parameter in volume or volume_mount stanza where we can specify subpath.

natelandau commented 3 years ago

Looking forward to this coming at some point as well. Needing to add separate host_volumes in the nomad config is driving me crazy. My use-case: single shared storage location with directories that are mounted into many different services. Due to reasons, I can't mount this location to a single path across all my clients and use the docke driver's volume mapping as I need to specify different paths to the root directory on different client types.

knorx commented 3 years ago

docker.volume.enabled as workaround is not really an option here since this imposes severe security issues. Anything on the host can then be mounted, the complete namespace feature would be meaningless. People hate me for introducing the static "can only mount root" host volumes instead of arbitrary bind mounts as they need to adjust their containers.

aossey commented 2 years ago

Also awaiting support for this, as enabling docker volumes does not provide job isolation but is the only viable option for this subdirectory use case right now.

jaen commented 2 years ago

@tgross I've seen some mentions of a push towards CSI general availability, would there be any chance of fixing this issue as a part of that?

tgross commented 2 years ago

Hi @jaen. No, this isn't really specific to CSI and we're not considering it a blocker for GA.

rdgreis commented 1 year ago

This feature would be handy.

hajali-amine commented 10 months ago

This feature! 🙏🏽