hashicorp / nomad-driver-podman

A nomad task driver plugin for sandboxing workloads in podman containers
https://developer.hashicorp.com/nomad/plugins/drivers/podman
Mozilla Public License 2.0
228 stars 62 forks source link

Mismatch between API call and nomad UI #117

Closed blmhemu closed 11 months ago

blmhemu commented 3 years ago

API says podman is rootless: true

Screenshot 2021-05-29 at 8 56 07 PM

image

Nomad Ui says rootless: false

Screenshot 2021-05-29 at 8 51 31 PM

There is only one socket for non-root user. There is NO socket for root.

The version of nomad-driver-podman is 0.2.0 and of nomad is 1.0.4


On an alternate bug, I am getting dial unix /run/podman/podman.sock: connect: no such file or directory" when no socket is provided. The expected thing is to scan in BOTH /run/podman/podman.socket and /run/user/<id>/podman/podman.sock. But looks like it is only checking the first one and throwing error.

Here is my config.

plugin "nomad-driver-podman" {
    config {
        volumes {
            enabled = true
            selinuxlabel = "z"
        }
        recover_stopped = true
    }
}
towe75 commented 3 years ago

Hi @blmhemu . The API / rootless problem is very likely a duplicate of #92. Maybe you want to try a build from main branch?

Regarding the socket path: yeah, we could check for uid!=0 and simply search for the user socket. Thank you for the hint. I will keep the issue open.

blmhemu commented 3 years ago

Thanks for the quick reply @towe75 I think the rootless bug is fixed in master and can be marked resolved.

image


Regarding the socket path: yeah, we could check for uid!=0 and simply search for the user socket. Thank you for the hint. I will keep the issue open.

Actually, we are doing that but the problem is that we are checking os.getuid which seems to be getting the uid of the user which is running this driver / program (in my case the owner of the binary is root). So it is only searching for rooted podman.sock

image

One solution can be along with uid check, we should check if podman.sock exists. And if both fails, we can iterate over all /run/user//podman to find any podman.sock and set it as default.

I can try raise a PR if this sounds good. Let me know what you think.


Getting the following error when trying to run the sample redis job on podman driver

image

(Ubuntu 21.04 ARM64, podman 3.1.2) podman ps and podman run <image> are working fine in the machine.

Interestingly, I can see images are being pulled (by running podman images ) But podman ps and podman ps -a show nothing

Here is my ansible role for podman: https://github.com/blmhemu/selfhost/blob/main/roles/ansible_podman/tasks/main.yml

towe75 commented 3 years ago

One solution can be along with uid check, we should check if podman.sock exists. And if both fails, we can iterate over all /run/user//podman to find any podman.sock and set it as default.

A nomad client can already be configure to use a specific socket if the uid's of nomad and podman are different. I don't like the idea of rolling the dice here. This can end up in weird situations if root is running nomad and a arbitrary user is logged in/not-logged-in while restarting nomad.

I'm also thinking about a map of symbolic names to sockets paths in the driver config. This way each task could point to a different nomad user (e.g. root, tennant1, tennant2)

Regarding your 500 problem: I suggest to raise the podman.service loglevel to info or debug like so:

Environment=LOGGING="--log-level=debug"

Then, restart the podman service and run your job. Check your journal, maybe you spot something.

blmhemu commented 3 years ago

The error is as follows on nomad:

2021-06-05T20:48:53.882+0530 [ERROR] client.rpc: error performing RPC to server: error="RPC Error:: 400,ACL support disabled" rpc=ACL.ResolveToken server=xx.xx.xx.xx:4647
2021-06-05T20:48:53.883+0530 [ERROR] client.rpc: error performing RPC to server which is not safe to automatically retry: error="RPC Error:: 400,ACL support disabled" rpc=ACL.ResolveToken server=xx.xx.xx.xx:4647

Will fetch podman logs soon.

towe75 commented 3 years ago

@blmhemu this seems to be rather some ACL issue between your nomad agents, unrelated to podman. Also this issue starts to become confusing because you started it with two problems where one is resolved in main branch and the other is basically a matter of correct configuration.

I can not correlate your latest comment to one of these former problems and also not to the "500" that you got with the redis job.

towe75 commented 1 year ago

@blmhemu any news here? There are no related reports since quite some time. I guess it's solved.

blmhemu commented 1 year ago

Hey ! I am sorry, the setup is nuked and am no longer using podman 😞