kubernetes-retired / external-storage

[EOL] External storage plugins, provisioners, and helper libraries
Apache License 2.0
2.7k stars 1.6k forks source link

[nfs] question: surviving pod restarts, nfs3 vs fsid_device=false #1274

Closed naseemkullah closed 4 years ago

naseemkullah commented 4 years ago

Hi @wongma7, thanks for this great project.

Going to NFS3 as per @kvaps' https://github.com/kubernetes-incubator/external-storage/pull/1241 and making fsid_device settable to false as per @thirdeyenick's https://github.com/kubernetes-incubator/external-storage/issues/1212 are both to solve for issues caused by pod restarts if I understand correctly.

Is the above statement correct? If so, could you please describe the difference between these two approaches? Which is recommended?

@kvaps @thirdeyenick please chime in with your thoughts as well if you have a moment, thanks!

kvaps commented 4 years ago

@naseemkullah there is also one unpleasant bug caused by IPVS graceful termination in kube-proxy: https://github.com/kubernetes/kubernetes/issues/84322 and formaly https://github.com/kubernetes/kubernetes/issues/81775

Until graceful termination will not be solved the nfs-provisioner pod restarts will cause hung clients.

As workaround I switched from kube-proxy to kube-router which having configurable behavior for the graceful termination, and it is switched off by default, which is working fine for me.

kvaps commented 4 years ago

As about Stale file handles this error might happen if you running nfs-server on two different filesystems with the same file structure, because by design NFS uses parent filesystem inodes and provides them "as is" to its clients. If I understood it correct the fsid_device option allows NFS-ganesha server to remember the inodes and always provide the same ones to the clients even if filesystem were changed to another one. Please correct me if I'm wrong.

The Export ID and FSID are used to uniquely identify handles to Ganesha. 
  In NFS, the client gets an opaque handle to things (files, 
directories, block-devices, etc.) and use these opaque handles to 
reference those objects on the server.  Ganesha breaks this up into 2 
parts: The global part specifies a version, an export ID, and a length; 
and the per-FSAL part is opaque, and controlled by the FSAL that owns 
the export.

VFS stores the FSID of the filesystem owning the object, and then the 
actual kernel handle (as passed to the *_at (open_by_handle_at(2) for 
example).  The reason for the FSID is that Linux handles are only 
guaranteed to be unique within a single filesystem.

If an export is removed, and another one is added, but it has the same 
system major/minor (which is the primary FSID on Linux), the handles 
that the client previously had open on the old export will try to be 
used on the new export, since, as far as Ganesha knows, they're valid 
for that export.

In general, an export ID/FSID combo should never be re-used for the 
lifetime of a Ganesha server instance.  This isn't a problem for 
filesystems based on block devices, since the FSID is based on the block 
device, and so will be unique, but can be a problem for FUSE, which 
generates it's FSID.

One way around this would be to create the new FUSE FS before you take 
down the old one.  That way it will get a new FSID.  Or you can just 
script configuration in Ganesha with unique FSIDs.  Ganesha has the 
ability to load config snippits (such as exports) from files with the 
%include directive.  You can try using that with generated exports.

Daniel

from here: https://lists.nfs-ganesha.org/archives/list/devel@lists.nfs-ganesha.org/message/LDR3XQ6CSYRFFEY55P6545TXZGLOKO2U/

cedricve commented 4 years ago

@naseemkullah there is also one unpleasant bug caused by IPVS graceful termination in kube-proxy: kubernetes/kubernetes#84322 and formaly kubernetes/kubernetes#81775

Until graceful termination will not be solved the nfs-provisioner pod restarts will cause hung clients.

As workaround I switched from kube-proxy to kube-router which having configurable behavior for the graceful termination, and it is switched off by default, which is working fine for me.

Interesting do you have any documentation for the latter? With latest 2.3.0 helm chart, but sometimes the clients are indeed still hanging.

kvaps commented 4 years ago

@cedricve you can read this https://github.com/kubernetes/kubernetes/issues/84322#issuecomment-546293485 and try to remove realserver manually on the client's node after hanging

cedricve commented 4 years ago

@cedricve you can read this kubernetes/kubernetes#84322 (comment) and try to remove realserver manually on the client's node after hanging

thanks but is a manual fix, after noticing a client is hanging?

kvaps commented 4 years ago

It is, there is nothing else for now, as workaround you can switch kube-proxy mode from IPVS to iptables or switch using kube-router instead of kube-proxy for the service proxy

cedricve commented 4 years ago

thanks great thoughts, this just make me think and scares me for everyone in the kubernetes world.. What other best practices could we/I follow for sharing volumes between deployments?

kvaps commented 4 years ago

thanks great thoughts, this just make me think and scares me for everyone in the kubernetes world.. What other best practices could we/I follow for sharing volumes between deployments?

Well best practices is using object storage instead, eg. S3, but this isn't always possible, since we have a ton of legacy, we need POSIX and working nfs-server for that

cedricve commented 4 years ago

I was about to create a central media server, on a seperate VM (POST/GET of files), because I'm just so desperate. Currently I'm still wondering when my clients might break. siigh

kvaps commented 4 years ago

@cedricve, if you have the opportunity to control behavior of your application, try using minio, it is easiest thing for organize reliable object storage

cedricve commented 4 years ago

thanks also found rook.io, not sure if they have any fixes for the stale handle issues? Did you noticed this one already @kvaps ?

naseemkullah commented 4 years ago

Is there a benefit in setting fsid_device to false when using NFS3? @kvaps

kvaps commented 4 years ago

I'm not sure, but I guess that so. This option explains nfs-server how to represent the data, but not affects transfer method.

naseemkullah commented 4 years ago

Would you think it a good idea to set fsid_device to false as a default in the helm chart?

kvaps commented 4 years ago

I'm fine with this change, the other question what is potential problems it can bring. I think @wongma7 is more competent to answer this question.

fejta-bot commented 4 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot commented 4 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten

fejta-bot commented 4 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close

k8s-ci-robot commented 4 years ago

@fejta-bot: Closing this issue.

In response to [this](https://github.com/kubernetes-incubator/external-storage/issues/1274#issuecomment-653913490): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen`. >Mark the issue as fresh with `/remove-lifecycle rotten`. > >Send feedback to sig-testing, kubernetes/test-infra and/or [fejta](https://github.com/fejta). >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.