Closed egegunes closed 5 months ago
Test name | Status |
---|---|
arbiter | passed |
balancer | passed |
custom-replset-name | passed |
cross-site-sharded | passed |
data-at-rest-encryption | passed |
data-sharded | passed |
demand-backup | passed |
demand-backup-eks-credentials | passed |
demand-backup-physical | passed |
demand-backup-physical-sharded | passed |
demand-backup-sharded | passed |
expose-sharded | passed |
ignore-labels-annotations | passed |
init-deploy | passed |
finalizer | passed |
ldap | passed |
ldap-tls | passed |
limits | passed |
liveness | passed |
mongod-major-upgrade | passed |
mongod-major-upgrade-sharded | passed |
monitoring-2-0 | passed |
multi-cluster-service | passed |
non-voting | passed |
one-pod | passed |
operator-self-healing-chaos | passed |
pitr | passed |
pitr-sharded | passed |
pitr-physical | passed |
pvc-resize | passed |
recover-no-primary | passed |
rs-shard-migration | passed |
scaling | passed |
scheduled-backup | passed |
security-context | passed |
self-healing-chaos | passed |
service-per-pod | passed |
serviceless-external-nodes | passed |
smart-update | passed |
split-horizon | passed |
storage | passed |
tls-issue-cert-manager | passed |
upgrade | passed |
upgrade-consistency | passed |
upgrade-consistency-sharded-tls | passed |
upgrade-sharded | passed |
users | passed |
version-service | passed |
We run 48 out of 48 |
commit: https://github.com/percona/percona-server-mongodb-operator/pull/1540/commits/1d9c93792f7264d46c30b78bcbb1b947d0951de9
image: perconalab/percona-server-mongodb-operator:PR-1540-1d9c9379
I believe this PR has broken running instances on CR 1.15.0 once the branch is pulled down - has this been tested?
I believe this PR has broken running instances once the branch is pulled down - has this been tested?
It was tested by our e2e tests. After the merge our QA team will perform tests. Please do not use main branch for production needs. It can be unstable.
@kantorcodes could you please provide your CR and we will test your case as well.
@kantorcodes could you please provide your CR and we will test your case as well.
On CR 1.16.0, cfg0-3 start, however mongos-0 reports: "Host failed in replica set" and "Error connecting to XX.XX.XX"
On CR 1.15.0, cfg-0 reports: "/opt/percona/ps-entry.sh: line 522: exec: numactl --interleave=all: not found" and mongos-0
does not start at all.
Do you have a recommended setup for running without TLS for the following variables?
spec.image
in cr.yaml
upgradeOptions.apply
in cr.yaml
CR
in cr.yaml
spec.containers.image
in bundle.yaml
@kantorcodes could you please provide your CR and we will test your case as well.
On CR 1.15.0, cfg-0 reports: "/opt/percona/ps-entry.sh: line 522: exec: numactl --interleave=all: not found" and
mongos-0
does not start at all.
As you can see from release notes PSMDB 1.15 operator was tested with MongoDB 4.4.24, 5.0.20, and 6.0.9 and numactl
was added to these docker files.
https://docs.percona.com/percona-operator-for-mongodb/RN/Kubernetes-Operator-for-PSMONGODB-RN1.15.0.html#supported-platforms:~:text=MongoDB%204.4.24%2C%205.0.20%2C%20and%206.0.9
How would we force version 6.0.9 when specifying spec.image
in cr.yaml
and how do we ensure the code for the operator in bundle.yaml
is using 1.15.0 ?
How would we force version 6.0.9 when specifying
spec.image
incr.yaml
?
You can set it via https://github.com/percona/percona-server-mongodb-operator/blob/v1.15.0/deploy/cr.yaml#L15 option
How would we force version 6.0.9 when specifying
spec.image
incr.yaml
?You can set it via https://github.com/percona/percona-server-mongodb-operator/blob/v1.15.0/deploy/cr.yaml#L15 option
What would be the correct value I mean?
How would we force version 6.0.9 when specifying
spec.image
incr.yaml
?You can set it via https://github.com/percona/percona-server-mongodb-operator/blob/v1.15.0/deploy/cr.yaml#L15 option
What would be the correct value I mean?
Using this link, you can get the correct value as well :)
@kantorcodes could you please provide your CR and we will test your case as well.
On CR 1.16.0, cfg0-3 start, however mongos-0 reports: "Host failed in replica set" and "Error connecting to XX.XX.XX"
Did you use the default CR? I can't reproduce it :(
How would we force version 6.0.9 when specifying
spec.image
incr.yaml
?You can set it via https://github.com/percona/percona-server-mongodb-operator/blob/v1.15.0/deploy/cr.yaml#L15 option
What would be the correct value I mean?
Using this link, you can get the correct value as well :)
Utilizing these combinations with TLS disabled, I am getting the following error.
{"t":{"$date":"2024-05-05T15:51:53.046Z"},"s":"F", "c":"CONTROL", "id":20574, "ctx":"-","msg":"Error during global initialization","attr":{"error":{"code":2,"codeName":"BadValue","errmsg":"need to enable TLS via the sslMode/tlsMode flag when using TLS configuration parameters"}}}
Happy to hop on a video call if you're willing to dissect this together further. Would that be helpful? Note, unsafeFlags
and tls
were added after this PR went up. Simply specifying allowUnsafeConfigurations
worked previously, I suspect a smart update cascaded issues here.
spec:
# platform: openshift
# clusterServiceDNSSuffix: svc.cluster.local
clusterServiceDNSMode: "External"
# pause: true
# unmanaged: false
crVersion: 1.15.0
image: percona/percona-server-mongodb:6.0.9-7
imagePullPolicy: Always
unsafeFlags:
tls: true
tls:
allowInvalidCertificates: true
mode: disabled
# enabled: false
# # 90 days in hours
# certValidityDuration: 2160h
# imagePullSecrets:
# - name: private-registry-credentials
# initImage: perconalab/percona-server-mongodb-operator:main
# initContainerSecurityContext: {}
allowUnsafeConfigurations: true
updateStrategy: SmartUpdate
Would specifying initImage
be helpful with this new edge case?
Utilizing these combinations with TLS disabled, I am getting the following error.
{"t":{"$date":"2024-05-05T15:51:53.046Z"},"s":"F", "c":"CONTROL", "id":20574, "ctx":"-","msg":"Error during global initialization","attr":{"error":{"code":2,"codeName":"BadValue","errmsg":"need to enable TLS via the sslMode/tlsMode flag when using TLS configuration parameters"}}}
Happy to hop on a video call if you're willing to dissect this together further. Would that be helpful? Note,
unsafeFlags
andtls
were added after this PR went up. Simply specifyingallowUnsafeConfigurations
worked previously, I suspect a smart update cascaded issues here.spec: # platform: openshift # clusterServiceDNSSuffix: svc.cluster.local clusterServiceDNSMode: "External" # pause: true # unmanaged: false crVersion: 1.15.0 image: percona/percona-server-mongodb:6.0.9-7 imagePullPolicy: Always unsafeFlags: tls: true tls: allowInvalidCertificates: true mode: disabled # enabled: false # # 90 days in hours # certValidityDuration: 2160h # imagePullSecrets: # - name: private-registry-credentials # initImage: perconalab/percona-server-mongodb-operator:main # initContainerSecurityContext: {} allowUnsafeConfigurations: true updateStrategy: SmartUpdate
Ok, thanks for CR. We will check it tomorrow in the morning.
Utilizing these combinations with TLS disabled, I am getting the following error.
{"t":{"$date":"2024-05-05T15:51:53.046Z"},"s":"F", "c":"CONTROL", "id":20574, "ctx":"-","msg":"Error during global initialization","attr":{"error":{"code":2,"codeName":"BadValue","errmsg":"need to enable TLS via the sslMode/tlsMode flag when using TLS configuration parameters"}}}
Happy to hop on a video call if you're willing to dissect this together further. Would that be helpful? Note,unsafeFlags
andtls
were added after this PR went up. Simply specifyingallowUnsafeConfigurations
worked previously, I suspect a smart update cascaded issues here.spec: # platform: openshift # clusterServiceDNSSuffix: svc.cluster.local clusterServiceDNSMode: "External" # pause: true # unmanaged: false crVersion: 1.15.0 image: percona/percona-server-mongodb:6.0.9-7 imagePullPolicy: Always unsafeFlags: tls: true tls: allowInvalidCertificates: true mode: disabled # enabled: false # # 90 days in hours # certValidityDuration: 2160h # imagePullSecrets: # - name: private-registry-credentials # initImage: perconalab/percona-server-mongodb-operator:main # initContainerSecurityContext: {} allowUnsafeConfigurations: true updateStrategy: SmartUpdate
Ok, thanks for CR. We will check it tomorrow in the morning.
Would you have any ideas for a stopgap solution in production? I believe a smart update could affect other servers running with a similar setup and bring them down.
Happy to hop on a video call if you're willing to dissect this together further. Would that be helpful? Note,
unsafeFlags
andtls
were added after this PR went up. Simply specifyingallowUnsafeConfigurations
worked previously, I suspect a smart update cascaded issues here.spec: # platform: openshift # clusterServiceDNSSuffix: svc.cluster.local clusterServiceDNSMode: "External" # pause: true # unmanaged: false crVersion: 1.15.0 image: percona/percona-server-mongodb:6.0.9-7 imagePullPolicy: Always unsafeFlags: tls: true tls: allowInvalidCertificates: true mode: disabled # enabled: false # # 90 days in hours # certValidityDuration: 2160h # imagePullSecrets: # - name: private-registry-credentials # initImage: perconalab/percona-server-mongodb-operator:main # initContainerSecurityContext: {} allowUnsafeConfigurations: true updateStrategy: SmartUpdate
Ok, thanks for CR. We will check it tomorrow in the morning.
Would you have any ideas for a stopgap solution in production? I believe a smart update could affect other servers running with a similar setup and bring them down.
Please do not use main branch for production. It was not tested by QA team. We run all needed tests before the release. You only need to use officially released versions of our operators.
Happy to hop on a video call if you're willing to dissect this together further. Would that be helpful? Note,
unsafeFlags
andtls
were added after this PR went up. Simply specifyingallowUnsafeConfigurations
worked previously, I suspect a smart update cascaded issues here.spec: # platform: openshift # clusterServiceDNSSuffix: svc.cluster.local clusterServiceDNSMode: "External" # pause: true # unmanaged: false crVersion: 1.15.0 image: percona/percona-server-mongodb:6.0.9-7 imagePullPolicy: Always unsafeFlags: tls: true tls: allowInvalidCertificates: true mode: disabled # enabled: false # # 90 days in hours # certValidityDuration: 2160h # imagePullSecrets: # - name: private-registry-credentials # initImage: perconalab/percona-server-mongodb-operator:main # initContainerSecurityContext: {} allowUnsafeConfigurations: true updateStrategy: SmartUpdate
Ok, thanks for CR. We will check it tomorrow in the morning.
Would you have any ideas for a stopgap solution in production? I believe a smart update could affect other servers running with a similar setup and bring them down.
Please do not use main branch for production. It was not tested by QA team. We run all needed tests before the release. You only need to use officially released versions of our operators.
Understood --- however, despite switching off the main
branch, the issue still persists, and looks like it can be replicated on a fresh setup as well.
Happy to hop on a video call if you're willing to dissect this together further. Would that be helpful? Note,
unsafeFlags
andtls
were added after this PR went up. Simply specifyingallowUnsafeConfigurations
worked previously, I suspect a smart update cascaded issues here.spec: # platform: openshift # clusterServiceDNSSuffix: svc.cluster.local clusterServiceDNSMode: "External" # pause: true # unmanaged: false crVersion: 1.15.0 image: percona/percona-server-mongodb:6.0.9-7 imagePullPolicy: Always unsafeFlags: tls: true tls: allowInvalidCertificates: true mode: disabled # enabled: false # # 90 days in hours # certValidityDuration: 2160h # imagePullSecrets: # - name: private-registry-credentials # initImage: perconalab/percona-server-mongodb-operator:main # initContainerSecurityContext: {} allowUnsafeConfigurations: true updateStrategy: SmartUpdate
Ok, thanks for CR. We will check it tomorrow in the morning.
Would you have any ideas for a stopgap solution in production? I believe a smart update could affect other servers running with a similar setup and bring them down.
Please do not use main branch for production. It was not tested by QA team. We run all needed tests before the release. You only need to use officially released versions of our operators.
Understood --- however, despite switching off the
main
branch, the issue still persists, and looks like it can be replicated on a fresh setup as well.
It was merged into main branch only. It can't affect any versions which were related before. The v1.15.0 CRDs do not have new options and old operator does not have a code which were added in main. Before the official release we will test this new options very carefully to be sure that new operator can work with old CR version.
CHANGE DESCRIPTION
Problem: Short explanation of the problem.
Cause: Short explanation of the root cause of the issue if applicable.
Solution: Short explanation of the solution we are providing with this PR.
CHECKLIST
Jira
Needs Doc
) and QA (Needs QA
)?Tests
compare/*-oc.yml
)?Config/Logging/Testability