openshift / origin

Conformance test suite for OpenShift
http://www.openshift.org
Apache License 2.0
8.49k stars 4.7k forks source link

Glusterfs with network encryption fails mounting when single node is down #12985

Closed aliscott closed 6 years ago

aliscott commented 7 years ago

Glusterfs volume fails to mount when client/server network encryption is enabled and a single gluster node is unavailable

Version
openshift v1.4.1
kubernetes v1.4.0+776c994
etcd 3.1.0-rc.0
Steps To Reproduce
  1. Create a persistent-volume-claim that uses glusterfs dynamic provisioning.
  2. Stop the volume, enable client.ssl and server.ssl and restart the volume.
  3. Mount the volume to a pod.
  4. Test reading and writing files to the volume. Everything works fine at this stage.
  5. Shut down a single glusterfs node
  6. Test reading and writing files to the volume.
Expected Result

I should be able to mount the volume and it should still be readable and writeable, since 2/3 of the gluster nodes are still running.

Current Result

Neither reading or writing to the volume works. When recreating the pod the volume fails to mount:

8s  8s  1 {kubelet node-fbcb9413635c}   Warning  FailedMount Unable to mount volumes for pod "nginx_test1(867619c0-dcc3-11e6-9ab1-06ff73935133)": timeout expired waiting for volumes to attach/mount for pod "nginx"/"test1". list of unattached/unmounted volumes=[mypd]
8s  8s  1 {kubelet node-fbcb9413635c}   Warning  FailedSync Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "nginx"/"test1". list of unattached/unmounted volumes=[mypd]

The volume works again when I bring up unavailable node or if I disable client and server SSL.

Additional Information

glusterd.log.txt

wenskys commented 7 years ago

I also have this question.

aliscott commented 7 years ago

I'm also having this problem when only management encryption is enabled.

obnoxxx commented 7 years ago

looking...

@raghavendra-talur , @ramkrsna, @humblec, @MohamedAshiqrh

raghavendra-talur commented 7 years ago

@aliscott I am not sure if enabling on management encryption is supported. I refer to https://kshlm.in/post/network-encryption-in-glusterfs/ for Gluster and ssl setup.

raghavendra-talur commented 7 years ago

@aliscott On the main issue

  1. how did you generate certs?
  2. self signed or common CA
  3. Does this happen when any of the 3 nodes are down or is it only one special node?
obnoxxx commented 7 years ago

@aliscott any update on this?

(github should introduce a "needinfo" flag ...)

aliscott commented 7 years ago

Sorry, I missed this.

  1. how did you generate certs?

I followed the guide here: https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3.1/html/Administration_Guide/chap-Network_Encryption.html

  1. self signed or common CA

Self-signed

  1. Does this happen when any of the 3 nodes are down or is it only one special node?

Any of the nodes

openshift-bot commented 6 years ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

alikhajeh1 commented 6 years ago

/remove-lifecycle stale any update on this?

JasonGiedymin commented 6 years ago

I'm going to link my update to an encryption related ticket only because there is so little documentation related to transit and rest encryption: https://github.com/openshift/origin/issues/13013

openshift-bot commented 6 years ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot commented 6 years ago

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten /remove-lifecycle stale

openshift-bot commented 6 years ago

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen. Mark the issue as fresh by commenting /remove-lifecycle rotten. Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci-robot commented 6 years ago

@openshift-bot: Closing this issue.

In response to [this](https://github.com/openshift/origin/issues/12985#issuecomment-425861705): >Rotten issues close after 30d of inactivity. > >Reopen the issue by commenting `/reopen`. >Mark the issue as fresh by commenting `/remove-lifecycle rotten`. >Exclude this issue from closing again by commenting `/lifecycle frozen`. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.