Open duckfullstop opened 3 years ago
Feel free to submit a PR for it. It will be great if Longhorn can work with NixOS
For the nsenter path modification, you can make this hard code path configurable, then you can simply modify the YAML files to make Longhorn fit NixOS. And besides the line (csi/nfs) you mentioned in the post, I think you may also need to modify repo longhorn/go-iscsi-helper
, which is a dependency of other Longhorn repos.
Here is the brief/possible solution:
go.mod
in repo longhorn/longhorn-manager
and repo longhorn/longhorn-engine
. Then run go mod tidy && go mod vendor
in the root directory of the repos to update the dependency.make
to build new images for longhorn-manager
and longhorn-engine
longhorn/longhorn-manager/deploy
.I'll happily take a look into putting some PRs together! Should have some time to delve it at the end of this week, otherwise hopefully next week.
Took 5 to look into how I'm going to implement this: changing the container's $PATH is probably the least painful but least clean, otherwise I could change the codepath of go-iscsi-helper
's Execute()
function to make an additional set of calls to find the binaries in a given set of extra paths (basically a wrapper around os/exec
's LookPath()
, which queries against $PATH), and then do nsenter [args] /i/found/this/path/and/it/contains/mount
.
I'll keep thinking about it, not sure which way to go because $PATH is arguably the authority on system path, having an additional wrapper over the top of it feels rather janky. If I do the former, I could also just add an EXTRA_PATH
envvar to the containers, and then have the container's bootstrap merge that into the system path (a bit like what my containers are doing right now) - would still need to patch https://github.com/longhorn/longhorn-manager/blob/master/csi/nfs/nsenter.go#L80 as previously mentioned though.
Working on this now, should have some preliminary PR's out by today or tomorrow.
I note the existence of @joshimoo's https://github.com/longhorn/longhorn-manager/commit/b467f4fbb9f936b91830449a125cea04c9b0b6b6 which seems to be working in a similar direction to the one I am, except way more cleanly; perhaps they could chime in with their intentions?
Having dived a bit deeper into this, I think implementing this one is going to be a bit beyond my pay grade, at least without sticking random calls to os.getEnv()
everywhere (which was making me throw up a bit). I started trying to add a new ExtraPath
attribute to the spec of longhorn.InstanceManager
, but there was so much cascading change that had to be done to do that (to longhorn.Node
, etc) that I'm not really confident submitting change of that scale to something I'm not intimately familiar with.
If you can provide some further guidance I'd be happy to assist, but for the time being I'd like to sit back and see what @joshimoo was working on with their commit.
I do have a preliminary changeset for longhorn/longhorn-instance-manager
that tweaks the nsenter stuff to use the system path instead of hardcoded paths - happy to push that if it would assist.
We currently do not have the capacity to test against non standard operating systems, for a list of our recommended operating systems have a look here: https://longhorn.io/docs/1.1.0/best-practices/#software
We do want to have another look at this issue once we are closer to the 1.2 release.
For folks using helmfiles
you can use a patch to get it working.
For example:
repositories:
- name: longhorn
url: https://charts.longhorn.io
releases:
- name: longhorn
namespace: longhorn-system
chart: longhorn/longhorn
version: 1.3.2
strategicMergePatches:
- kind: DaemonSet
apiVersion: apps/v1
metadata:
name: longhorn-manager
namespace: longhorn-system
spec:
template:
spec:
containers:
- name: longhorn-manager
env:
- name: PATH
value: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/run/wrappers/bin:/nix/var/nix/profiles/default/bin:/run/current-system/sw/bin
Also, if you are like me, and you run the root filesystem out of tmpfs
mount, you'll need to create and mount persistent storage for the default location /var/lib/longhorn
.
N.B. Ah this still doesn't work due to the instance-manager being launched outside the helmfile
...
If createInstanceManagerPod passed on its PATH onto the containers it created, that should do it? createGenericManagerPodSpec. Seems like the place to do it.
@nigelgbanks a mutating admission webhook for pods in the longhorn-system namespace could be used a workaround in the mean time. To modify the pod spec as part of the creation api request.
@joshimoo thanks, for the recommendation, it was interesting reading. Shame PodPreset
has been removed that would have worked a treat. Writing a webhook for this seems a bit excessive, and a bit much of a maintenance burden. I've decided to just maintain a custom image in the same way @duckfullstop has done.
Oh just found this https://artifacthub.io/packages/helm/kyverno/kyverno which might suffice.
Updated helm file example:
repositories:
- name: longhorn
url: https://charts.longhorn.io
- name: kyverno
url: https://kyverno.github.io/kyverno
- name: incubator
url: https://charts.helm.sh/incubator
releases:
- name: longhorn
namespace: longhorn-system
chart: longhorn/longhorn
version: 1.3.2
- name: kyverno
namespace: kyverno
chart: kyverno/kyverno
version: 2.6.1
- name: longhorn-admission-hooks
namespace: longhorn-system
chart: incubator/raw
needs:
- kyverno/kyverno
values:
- resources:
- apiVersion: v1
kind: ConfigMap
metadata:
name: longhorn-custom-path
namespace: longhorn-system
data:
PATH: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/run/wrappers/bin:/nix/var/nix/profiles/default/bin:/run/current-system/sw/bin
- apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: add-host-path-to-longhorn
annotations:
policies.kyverno.io/title: Add Environment Variables from ConfigMap
policies.kyverno.io/subject: Pod
policies.kyverno.io/category: Other
policies.kyverno.io/description: >-
Longhorn invokes executables on the host system, and needs
to be aware of the host systems PATH. This modifies all
deployments such that the PATH is explicitly set to support
NixOS based systems.
spec:
rules:
- name: add-env-vars
match:
resources:
kinds:
- Pod
namespaces:
- longhorn-system
mutate:
patchStrategicMerge:
spec:
initContainers:
- (name): "*"
envFrom:
- configMapRef:
name: longhorn-custom-path
containers:
- (name): "*"
envFrom:
- configMapRef:
name: longhorn-custom-path
Will we receive a complete solution for this issue, or will the workaround(helm) remain semi-permanent?
Does anyone have a working version of the above for kustomize? And of course having this issue fixed instead of a workaround would be a major improvement
Instead of adding /run/wrappers/bin:/nix/var/nix/profiles/default/bin:/run/current-system/sw/bin
to PATH, it would be nice to be able to inject a custom binPath from the outside and actually get the Longhorn dependencies listed in the project README - instead of magic installers like: https://github.com/longhorn/longhorn/blob/master/deploy/prerequisite/longhorn-cifs-installation.yaml#L8
failing to start iscsi i have openiscsi installed
Failed to discover\" error=\"failed to execute: nsenter [--mount=/host/proc/1/ns/mnt --net=/host/proc/1/ns/net iscsiadm -m disco │
│ very -t sendtargets -p 10.42.0.25], output , stderr iscsiadm: can't open iscsid.startup configuration file etc/iscsi/iscsid.conf\\niscsiadm: iscsid is not running. Could not start it up automatically using the startup command in the iscsid.conf iscsid.startup setting. Please ch │
│ eck that the file exists or that your init scripts have started iscsid.\\niscsiadm: can not connect to iSCSI daemon (111)!\\niscsiadm: can't open iscsid.startup configuration file etc/iscsi/iscsid.conf\\niscsiadm: iscsid is not running. Could not start it up automatically using │
│ the startup command in the iscsid.conf iscsid.startup setting. Please check that the file exists or that your init scripts have started iscsid.\\niscsiadm: can not connect to iSCSI daemon (111)!\\niscsiadm: Cannot perform discovery. Initiatorname required.\\niscsiadm: Could no │
│ t perform SendTargets discovery: could not connect to iscsid\\n: exit status 20\"
I'm currently using this as a workaround:
systemd.tmpfiles.rules = [
"L+ /usr/local/bin - - - - /run/current-system/sw/bin/"
];
Given that /usr/local/bin
is not part of NixOS' default PATH, I reckon there are no side effects.
Do we know why longhorn driver needs to use nsenter at all? In general we have moved away from using nsenter in kubernetes itself. nsenter was necessary when we didn't had mount propagation settings at pod level, is it needed for iscsiadm
?
For me it was two things for nixos. The most important was @joaojacome 's input but I also was missing openiscsi on the nodes. So every node that is in my cluster now has this additional configuration
systemd.tmpfiles.rules = [
"L+ /usr/local/bin - - - - /run/current-system/sw/bin/"
];
services.openiscsi = {
enable = true;
name = "<some-name>";
};
So far this seems to work. Will post updates if I discover more. But this also seems to point at @gnufied s question that nsenter is used for iscsiadm at least that was what my logs suggested as well
I'm currently using this as a workaround:
systemd.tmpfiles.rules = [ "L+ /usr/local/bin - - - - /run/current-system/sw/bin/" ];
Given that
/usr/local/bin
is not part of NixOS' default PATH, I reckon there are no side effects.
This worked well to get longhorn-manager up and running which was great. I however later noticed that no volume could get mounted by any workload if the worker node was a NixOS node. Im guessing this also has to do with the non standard paths in NixOS.
The driver-deployer also has this issue, all other longhorn pods are ok:
failed to try resolving symlinks in path "/var/log/pods/longhorn-system_longhorn-driver-deployer-84b7958d5b-gdkd8_a811ed14-4e0d-47f4-bff7-2e1a26d0df0c/longhorn-driver-deployer/231.log": lstat /var/log/pods/longhorn-system_longhorn-driver-deployer-84b7958d5b-gdkd8_a811ed14-4e0d-47f4-bff7-2e1a26d0df0c/longhorn-driver-deployer/231.log: no such file or directory
Has anyone else seen this?
@bananflugan have you tried the kyverno yaml included here
@joshuacox Yeah i installed Kyverno into its own namespace and then installed the policy resource into longhorn-system namespace. Sadly it did nothing for me, but i dont know if i did something wrong or missing something.
@bananflugan ensure you are awaiting I had to add these waits in before mine would work:
#!/bin/sh
kubectl_native_wait () {
TARGET_POD=$1
TARGET_NAMESPACE=$2
kubectl wait --namespace $TARGET_NAMESPACE --for=condition=ready pod $TARGET_POD --timeout=120s
}
THIS_CWD=$(pwd)
set -eux
kubectl create -f https://github.com/kyverno/kyverno/releases/download/v1.12.0/install.yaml
kubectl create ns longhorn-system
#kubectl label namespace longhorn-system istio-injection=enabled --overwrite
kubectl apply -f longhornfixnix.yml
kubectl_native_wait $(kubectl get po -n kyverno|grep kyverno-admission-controller|cut -f1 -d ' ') kyverno
kubectl_native_wait $(kubectl get po -n kyverno|grep kyverno-background-controller|cut -f1 -d ' ') kyverno
kubectl_native_wait $(kubectl get po -n kyverno|grep kyverno-cleanup-controller|cut -f1 -d ' ') kyverno
kubectl_native_wait $(kubectl get po -n kyverno|grep kyverno-reports-controller|cut -f1 -d ' ') kyverno
sleep 5
kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.7.0/aio/deploy/recommended.yaml
helm install longhorn \
longhorn/longhorn \
--namespace longhorn-system \
--create-namespace \
--version 1.6.2 \
-f longhorn-values.yml
where that fixnix file is just the stuff from that wiki:
---
apiVersion: v1
kind: ConfigMap
metadata:
name: longhorn-nixos-path
namespace: longhorn-system
data:
PATH: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/run/wrappers/bin:/nix/var/nix/profiles/default/bin:/run/current-system/sw/bin
---
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: longhorn-add-nixos-path
annotations:
policies.kyverno.io/title: Add Environment Variables from ConfigMap
policies.kyverno.io/subject: Pod
policies.kyverno.io/category: Other
policies.kyverno.io/description: >-
Longhorn invokes executables on the host system, and needs
to be aware of the host systems PATH. This modifies all
deployments such that the PATH is explicitly set to support
NixOS based systems.
spec:
rules:
- name: add-env-vars
match:
resources:
kinds:
- Pod
namespaces:
- longhorn-system
mutate:
patchStrategicMerge:
spec:
initContainers:
- (name): "*"
envFrom:
- configMapRef:
name: longhorn-nixos-path
containers:
- (name): "*"
envFrom:
- configMapRef:
name: longhorn-nixos-path
---
and I most assuredly can bound a volume:
k get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
csi-longhorn Bound pvc-b230f9d8-fe57-4cde-ae6a-aee942bdab46 4Gi RWO longhorn 4s
with pvc
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: csi-longhorn
spec:
storageClassName: longhorn
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 4Gi
and test pod:
apiVersion: v1
kind: Pod
metadata:
name: hello-local-csi-longhorn-pod
spec:
volumes:
- name: local-storage
persistentVolumeClaim:
claimName: csi-longhorn
containers:
- name: hello-container
image: busybox
command:
- sh
- -c
- 'while true; do echo "`date` [`hostname`] Hello from OpenEBS Local PV." >> /mnt/store/greet.txt; sleep $(($RANDOM % 5 + 300)); done'
volumeMounts:
- mountPath: /mnt/store
name: local-storage
@joshuacox Thank you for taking the time to help me with this. I have now gotten everything to work in our sandbox cluster and the volume problem seem to have had a connection with my other issue i posted further up in this issue.
I first had symlink issues with logging in the driver-deployer deploy and later i noticed every single deploy in the whole cluster got this problem. I figured out that docker install on nixos by default is logging to journald and not json-file which debian and ubuntu installs do.
After adding this to configuration.nix my volume issues where gone as well.
virtualisation.docker.logDriver = "json-file";
I recreated the longhorn-system namespace, installed the kyverno policy resource again and installed longhorn with all being fine.
Thanks again Josh! <3
@Lindenk I created a helmfile that uses kustomize instead of kyverno to add the PATH environment variable:
Describe the bug Linux distributions such as NixOS use a very non-standard path (e.g
/run/wrappers/bin:/root/.nix-profile/bin:/etc/profiles/per-user/root/bin:/nix/var/nix/profiles/default/bin:/run/current-system/sw/bi
) - in NixOS's case, this is to allow for use of the Nix package manager.On these systems, because Longhorn uses nsenter to enter the host namespace, and nsenter uses the path of the namespaces that called it, basic system utilities such as
mount
, as well asiscsiadm
can't be found or called (producesFailed to execute: nsenter [–mount=/host/proc/*/ns/mnt mount], output nsenter: failed to execute mount: No such file or directory
andFailed environment check, please make sure you have iscsiadm/open-iscsi installed on the host
during initial startup of the longhorn-manager pod).I have built some extremely scrappy derived images based on the current stable release of Longhorn at https://github.com/duckfullstop/nixos-longhorn - these simply append the necessary NixOS-specific path entries to the container's path, such that
nsenter
can find the requisite binaries properly. This is not a complete fix - there is, for some reason, a hardcoded lookup path in longhorn-manager/csi/nfs/nsenter.go that means the share-manager pod cannot properly instantiateReadWriteMany
Persistent Volumes. However, even without patching that, the containers with the updated path work perfectly, and Longhorn works flawlessly (excepting theReadWriteMany
issue).The affected Longhorn components are
longhorn-manager
andlonghorn-instance-manager
, as these are the only two images that make nsenter calls.I'm not 100% sure what the best and cleanest way to resolve this is, so I'm opening this as an issue instead of a load of PRs.
To Reproduce
openiscsi
is available in the host environment withconfig.environment.systemPackages = [ pkgs.openiscsi ];
Expected behavior longhorn-manager proceeds with initialisation.
Log
During startup, if not using containers with patched $PATH (log from longhorn-manager):
When attempting to use a ReadWriteMany Persistent Volume Claim with patched $PATH (log from longhorn-csi-plugin/longhorn-csi-plugin):
Environment:
Additional context The cleanest way that I can personally think of to resolve this one is to add a variable to
longhorn-manager
andlonghorn-instance-manager
that allows for appending to the system path, and to updatecsi/nfs/nsenter.go
to check against this path instead of the hardcoded choices.