[stable/mongodb-replicaset] Mongodb replicaset not forming

ggn06awu commented 7 years ago

Is this a request for help?: Yes

Is this a BUG REPORT or FEATURE REQUEST? (choose one): Bug Report

When running the helm install (see command below) Kubernetes is spinning up 3 pods successfully, but the replicaSet isn't forming. The behaviour is erratic, sometimes i've had 2/3 nodes join the replicaSet, sometimes I've had none join, sometimes I get two 2 distinct primary's and one reading "Does not have a valid replica set config" (as in the case in this report):

Install Command

helm install -f values.yaml --name hub-dev-mongo stable/mongodb-replicaset

values.yaml

port: 27017

auth:
  enabled: false
  # adminUser:
  # adminPassword:
  # key:
  # existingKeySecret:
  # existingAdminSecret:

# Specs for the Docker image for the init container that establishes the replica set
installImage:
  name: gcr.io/google-containers/mongodb-install
  tag: 0.4
  pullPolicy: IfNotPresent

# Specs for the MongoDB image
image:
  name: mongo
  tag: 3.4
  pullPolicy: IfNotPresent

# Annotations to be added to MongoDB pods
podAnnotations: {}

resources: {}
# limits:
#   cpu: 100m
#   memory: 512Mi
# requests:
#   cpu: 100m
#   memory: 512Mi

persistentVolume:
  enabled: true
  volume.beta.kubernetes.io/storage-class: managed-nfs-storage
  ## Default: volume.alpha.kubernetes.io/storage-class: default
  ##
  storageClass: managed-nfs-storage
  accessModes:
    - ReadWriteOnce
  size: 10Gi
  annotations: {volume.beta.kubernetes.io/storage-class: "managed-nfs-storage"}

# Annotations to be added to the service
serviceAnnotations: {}

# Entries for the MongoDB config file
configmap:
  storage:
    dbPath: /data/db
  net:
    port: 27017
  replication:
    replSetName: rs0
# security:
#   authorization: enabled
#   keyFile: /keydir/key.txt

Mongo isMaster output

$ for i in 0 1 2; do kubectl exec $RELEASE_NAME-mongodb-replicaset-$i -- sh -c 'mongo --eval="printjson(rs.isMaster())"'; done

MongoDB shell version v3.4.6
connecting to: mongodb://127.0.0.1:27017
MongoDB server version: 3.4.6
{
    "hosts" : [
        "hub-dev-mongo-mongodb-replicaset-0.hub-dev-mongo-mongodb-replicaset.default.svc.cluster.local:27017"
    ],
    "setName" : "rs0",
    "ismaster" : false,
    "secondary" : false,
    "info" : "Does not have a valid replica set config",
    "isreplicaset" : true,
    "maxBsonObjectSize" : 16777216,
    "maxMessageSizeBytes" : 48000000,
    "maxWriteBatchSize" : 1000,
    "localTime" : ISODate("2017-08-01T09:07:21.462Z"),
    "maxWireVersion" : 5,
    "minWireVersion" : 0,
    "readOnly" : false,
    "ok" : 1
}
MongoDB shell version v3.4.6
connecting to: mongodb://127.0.0.1:27017
MongoDB server version: 3.4.6
{
    "hosts" : [
        "hub-dev-mongo-mongodb-replicaset-1.hub-dev-mongo-mongodb-replicaset.default.svc.cluster.local:27017"
    ],
    "setName" : "rs0",
    "setVersion" : 1,
    "ismaster" : true,
    "secondary" : false,
    "primary" : "hub-dev-mongo-mongodb-replicaset-1.hub-dev-mongo-mongodb-replicaset.default.svc.cluster.local:27017",
    "me" : "hub-dev-mongo-mongodb-replicaset-1.hub-dev-mongo-mongodb-replicaset.default.svc.cluster.local:27017",
    "electionId" : ObjectId("7fffffff0000000000000002"),
    "lastWrite" : {
        "opTime" : {
            "ts" : Timestamp(1501578436, 1),
            "t" : NumberLong(2)
        },
        "lastWriteDate" : ISODate("2017-08-01T09:07:16Z")
    },
    "maxBsonObjectSize" : 16777216,
    "maxMessageSizeBytes" : 48000000,
    "maxWriteBatchSize" : 1000,
    "localTime" : ISODate("2017-08-01T09:07:21.874Z"),
    "maxWireVersion" : 5,
    "minWireVersion" : 0,
    "readOnly" : false,
    "ok" : 1
}
MongoDB shell version v3.4.6
connecting to: mongodb://127.0.0.1:27017
MongoDB server version: 3.4.6
{
    "hosts" : [
        "hub-dev-mongo-mongodb-replicaset-2.hub-dev-mongo-mongodb-replicaset.default.svc.cluster.local:27017"
    ],
    "setName" : "rs0",
    "setVersion" : 1,
    "ismaster" : true,
    "secondary" : false,
    "primary" : "hub-dev-mongo-mongodb-replicaset-2.hub-dev-mongo-mongodb-replicaset.default.svc.cluster.local:27017",
    "me" : "hub-dev-mongo-mongodb-replicaset-2.hub-dev-mongo-mongodb-replicaset.default.svc.cluster.local:27017",
    "electionId" : ObjectId("7fffffff0000000000000002"),
    "lastWrite" : {
        "opTime" : {
            "ts" : Timestamp(1501578435, 1),
            "t" : NumberLong(2)
        },
        "lastWriteDate" : ISODate("2017-08-01T09:07:15Z")
    },
    "maxBsonObjectSize" : 16777216,
    "maxMessageSizeBytes" : 48000000,
    "maxWriteBatchSize" : 1000,
    "localTime" : ISODate("2017-08-01T09:07:22.322Z"),
    "maxWireVersion" : 5,
    "minWireVersion" : 0,
    "readOnly" : false,
    "ok" : 1
}

Version of Helm and Kubernetes:

$ helm version

Client: &version.Version{SemVer:"v2.5.0", GitCommit:"012cb0ac1a1b2f888144ef5a67b8dab6c2d45be6", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.5.0", GitCommit:"012cb0ac1a1b2f888144ef5a67b8dab6c2d45be6", GitTreeState:"clean"}

$ kubectl version

Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.0", GitCommit:"d3ada0119e776222f11ec7945e6d860061339aad", GitTreeState:"clean", BuildDate:"2017-06-29T23:15:59Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.2+coreos.0", GitCommit:"c6574824e296e68a20d36f00e71fa01a81132b66", GitTreeState:"clean", BuildDate:"2017-07-24T23:28:22Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

Which chart:

stable/mongodb-replicaset

What happened:

Installed the chart, three pods are created and deployed to three members of the cluster. Not all members join the replicaset, sometimes two will, usually none at all (by running sequential installs and then deletes + purges). It feels like a race condition behaviour.

Here is the rs.conf() on each:

MongoDB server version: 3.4.6
{
    "_id" : "rs0",
    "version" : 1,
    "protocolVersion" : NumberLong(1),
    "members" : [
        {
            "_id" : 0,
            "host" : "hub-dev-mongo-mongodb-replicaset-0.hub-dev-mongo-mongodb-replicaset.default.svc.cluster.local:27017",
            "arbiterOnly" : false,
            "buildIndexes" : true,
            "hidden" : false,
            "priority" : 1,
            "tags" : {

            },
            "slaveDelay" : NumberLong(0),
            "votes" : 1
        }
    ],
    "settings" : {
        "chainingAllowed" : true,
        "heartbeatIntervalMillis" : 2000,
        "heartbeatTimeoutSecs" : 10,
        "electionTimeoutMillis" : 10000,
        "catchUpTimeoutMillis" : 60000,
        "getLastErrorModes" : {

        },
        "getLastErrorDefaults" : {
            "w" : 1,
            "wtimeout" : 0
        },
        "replicaSetId" : ObjectId("597f7ef5e67cf022d5102008")
    }
}

MongoDB server version: 3.4.6
{
    "_id" : "rs0",
    "version" : 1,
    "protocolVersion" : NumberLong(1),
    "members" : [
        {
            "_id" : 0,
            "host" : "hub-dev-mongo-mongodb-replicaset-1.hub-dev-mongo-mongodb-replicaset.default.svc.cluster.local:27017",
            "arbiterOnly" : false,
            "buildIndexes" : true,
            "hidden" : false,
            "priority" : 1,
            "tags" : {

            },
            "slaveDelay" : NumberLong(0),
            "votes" : 1
        }
    ],
    "settings" : {
        "chainingAllowed" : true,
        "heartbeatIntervalMillis" : 2000,
        "heartbeatTimeoutSecs" : 10,
        "electionTimeoutMillis" : 10000,
        "catchUpTimeoutMillis" : 60000,
        "getLastErrorModes" : {

        },
        "getLastErrorDefaults" : {
            "w" : 1,
            "wtimeout" : 0
        },
        "replicaSetId" : ObjectId("597f808198cf2eb1cb4745ad")
    }
}

MongoDB server version: 3.4.6
{
    "_id" : "rs0",
    "version" : 1,
    "protocolVersion" : NumberLong(1),
    "members" : [
        {
            "_id" : 0,
            "host" : "hub-dev-mongo-mongodb-replicaset-2.hub-dev-mongo-mongodb-replicaset.default.svc.cluster.local:27017",
            "arbiterOnly" : false,
            "buildIndexes" : true,
            "hidden" : false,
            "priority" : 1,
            "tags" : {

            },
            "slaveDelay" : NumberLong(0),
            "votes" : 1
        }
    ],
    "settings" : {
        "chainingAllowed" : true,
        "heartbeatIntervalMillis" : 2000,
        "heartbeatTimeoutSecs" : 10,
        "electionTimeoutMillis" : 10000,
        "catchUpTimeoutMillis" : 60000,
        "getLastErrorModes" : {

        },
        "getLastErrorDefaults" : {
            "w" : 1,
            "wtimeout" : 0
        },
        "replicaSetId" : ObjectId("597f80b0d50c877690939ef0")
    }
}

Deleting a pod doesn't seem to help.

What you expected to happen:

All members, once started, join the nominated replicaSet.

How to reproduce it (as minimally and precisely as possible): Run the install command, sometimes 2 members join the replicaSet, usually none.

Anything else we need to know: This is a bare metal install of kubernetes on Core OS. The only noteworthy aspect of this set up is I'm using nfs-client storage to generate PVC requests to an external NAS (https://github.com/kubernetes-incubator/external-storage/tree/master/nfs-client). Perhaps a race condition around that and its performance (seems fast enough...)?

Not sure if that's relevant, volume claims do work fine on other applications.

unguiculus commented 7 years ago

Hm, sounds weird. I've been using this for months without such issues.

@foxish Any ideas?

ggn06awu commented 7 years ago

@unguiculus turned out two of the 5 worker nodes appeared healthy but actually had a docker bridge issue (not lifting the correct subnet from flannel's configuration). DNS resolution was subsequently timing out and the replicaSet wasn't able to resolve the other expected members. Closing accordingly, thanks for looking.

gvenki commented 6 years ago

Hi @ggn06awu , i am having similar issue, can you help me what need's to be corrected ? thanks in advance

dreamalarm commented 6 years ago

I am also facing the same issue! In my case, all 3 replicaset created but without valid replica set config

"ismaster" : false,
    "secondary" : false,
    "info" : "Does not have a valid replica set config",
    "isreplicaset" : true,

stefanthorpe commented 6 years ago

Having the same issue.

Simple running:

cfg = rs.config()
rs.reconfig(cfg, {force:true})

Gets the cluster back up again, but clearly this isn't a fix.

I'm also able to reproduce the error. If a do a serious of db.shutdownServer() across my clusters.

gvenki commented 6 years ago

i changed network driver from flannel to calico (on centOS) worked fine for me

jeff-r-koyaltech commented 3 years ago

i changed network driver from flannel to calico (on centOS) worked fine for me

Thanks, this was my problem in Azure AKS v 1.19.9. Switching (from Azure CNI) to calico while creating a new AKS cluster resolved my replica set config problem:

rs.status(); resulted in: -> errmsg: "no replset config has been received"

helm / charts

[stable/mongodb-replicaset] Mongodb replicaset not forming #1591