Closed ctappy closed 1 year ago
Hi,
Such broken mounts can be unmounted with just regular umount
(after all processes using the mount exit) and mean that geesefs crashed or was killed with SIGKILL. I tried to fix this issue on csi-s3 side several times but it seems that fixes are still not enough because k8s does some additional checks on the mountpoint first... Also it seems that there's some timeout-related issue, but as I understand it reproduces 1 time out of 50 so diagnosing it is hard :)
I also tried to add -o auto_unmount
support to the FUSE binding of GeeseFS (in the same way as it's implemented in libfuse) but I didn't succeed yet. I'll make another attempt some time I think :)
And I can add an option to disable mount sharing of course, but I implemented it on purpose, I thought it was a feature, not a bug :))
I really like the share feature, with the memory usage in Geesefs, and what I am trying to do, it's helpful. 1 in 50 sounds right, when I was using another OS, I don't recall, it happened more often which could of been an older libfuse or k8s setup.
I was thinking about creating a symlink workaround method in the csi-s3 - creating a randomly/incrementally named symlink/directory on mount and on unmount removing it and ignoring errors(this issue).
@vitalif if I were to start working on creating an incremental directory setup with mount issue checks and then create a new incremental directory, where would be a good place to start? I may start working on this in about a month or two maybe more but any advice would be great. Thanks and awesome project!
I updated to the latest csi, was on 0.34, to the latest and saw the systemd service changes that potentially could resolve this. For now I am closing. Thanks!
after an unmount in k8s with geesefs the globalmount directory shows
Transport endpoint is not connected
Once I see this issue the node/server cannot be used and needs to be removed. not sure if this is something that can be resolved with geesefs because it could be a bad unmount with k8s and a system problem.
another issue is some containers will share the mount, so we can't use lifecycles in pods to unmount because other containers would be disrupted. only when the mount is not being used by any containers the geesefs process is killed and occasionally causes this issue
any thoughts?