Closed bszeti closed 3 years ago
Hi @bszeti ! Thanks for filling this bug. I'll take a look at it today. That one could be tricky since we have some use cases that RollingUpdate
would be a better fit. @LCaparelli any issues regarding the update scenario?
Hey @bszeti thanks for raising the issue. As @ricardozanini mentioned the RollingUpdate
option is a better fit for the way automatic updates are handled, so that your current deployment is not unavailable while the updated deployment is still being performed.
@bszeti can you please provide how you installed the operator and what version you're running? Along with that, please also share the output from:
$ oc describe pod
Be sure to run it in the project on which the failing pod is.
I'll try to replicate the issue, but so far here's what I have, installing v0.4.0 via OLM:
@LCaparelli I believe he also has a PV updated with some information. That way, Nexus will lock the data directory, preventing the rollingupdate.
If this is the case, we must change to "Recreate", since even for updating we won't be able to do it with the data folder locked. Or at least, signal to the server to unlock the data directory before performing the update.
@ricardozanini So if I enable persistence I should run into this issue, right? Let me give that a swing
Ah yes, indeed. I have reproduced the same issue. Simply using Recreate
when persistence is enabled would do the trick, but would bring availability issues for automatic updates with persistence. I'll give it some further thought, perhaps there's a way to deal with this without negative outcomes.
At the moment no action from you is requested @bszeti, thanks again for reporting it. :-)
Hi, Thanks for looking into this.
Yes, of course the issue only shows up if you use persistence. Nexus has a lock file, so the new Pod can't start until the old one is running. (By the way isn't this a problem if number of Nexus replicas is greater than one??)
Install operator:
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
name: nexus
spec:
targetNamespaces:
- nexus
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: nexus-operator-m88i
spec:
channel: alpha
name: nexus-operator-m88i
source: community-operators
sourceNamespace: openshift-marketplace
Install Nexus:
apiVersion: apps.m88i.io/v1alpha1
kind: Nexus
metadata:
name: nexus3
spec:
resources:
limits:
cpu: '2'
memory: 2Gi
requests:
cpu: 1000m
memory: 2Gi
useRedHatImage: true
serverOperations:
disableOperatorUserCreation: false
imagePullPolicy: Always
networking:
expose: true
exposeAs: Route
tls:
mandatory: true
replicas: 1
persistence:
persistent: true
volumeSize: 10Gi
Hi @bszeti, yes it's a problem. Only horizontal scaling is supported at this time until we implement #61 I'll also take a look into the Nexus documentation to see if I can figure out another workaround for this problem.
Describe the bug Modifying the Nexus resource triggers a new Deployment, but the new Pod can't start (CrashLoopBackOff) because the previous one is still holding the /nexus-data/lock. The problem is probably cause by the Deployment using spec.strategy=RollingUpdate. Using "Recreate" may help, so previous Nexus instance is shut down before the new one is created.
To Reproduce Steps to reproduce the behavior:
Expected behavior The Deployment is successfully rolled out.
Environment OpenShift 4.6.5 Client Version: 4.4.30 Server Version: 4.6.5 Kubernetes Version: v1.19.0+9f84db3
Additional context Add any other context about the problem here.