Closed alexwine36 closed 5 years ago
this was the answer I got from the chart owner for more or less the same question. but to be honest, I was a little overstrained with that answer :)
quote: The quick solution is to wrap an operator that knows to switch the service tags based on which pod is the current master. If you're using the other members of the mongodb-repset as read slaves, then it makes sense to have a single service over all replicas for reads, and another service to handle writes via the master. That's one way to handle this use-case.
The chart has a governing service for the statefulset which is stateless. If you want to expose the replica set, you need additional services, one per replica, which could be of type NodePort or LoadBalancer. In your connect string you would then add all the service URLs. You don't connect to one individual replica. Let the client figure that out.
https://docs.mongodb.com/manual/reference/connection-string/
/cc @foxish
would it be feasible to have a setting for that on the helm chart?
something like replicaServiceType=NodePort
Care to create a PR?
After creating a NodePort Service Type I am getting errors that the local dns names cannot be resolved. For example
MongoError: failed to connect to server [mongodb-mongodb-replicaset-1.mongodb-mongodb-replicaset.default.svc.cluster.local:27017] on first connect [MongoError: getaddrinfo ENOTFOUND mongodb-mongodb-replicaset-1.mongodb-mongodb-replicaset.default.svc.cluster.local mongodb-mongodb-replicaset-1.mongodb-mongodb-replicaset.default.svc.cluster.local:27017] @unguiculus @adrianliechti
@alexwine36
$ helm install stable/mongodb-replicaset
==> name wise-owl
after a while, some pods starting coming up:
$ kubectl get pods
wise-owl-mongodb-replicaset-0 1/1 Running 0 1m
wise-owl-mongodb-replicaset-1 1/1 Running 0 42s
wise-owl-mongodb-replicaset-2 0/1 Running 0 20s
now create a service for each of the pod
$ kubectl expose pod wise-owl-mongodb-replicaset-0 --type=NodePort
$ kubectl expose pod wise-owl-mongodb-replicaset-1 --type=NodePort
$ kubectl expose pod wise-owl-mongodb-replicaset-2 --type=NodePort
if you list your services, you now see the individual ports
wise-owl-mongodb-replicaset-0 10.0.0.46 <nodes> 27017:32151/TCP 30s
wise-owl-mongodb-replicaset-1 10.0.0.158 <nodes> 27017:30616/TCP 25s
wise-owl-mongodb-replicaset-2 10.0.0.11 <nodes> 27017:30167/TCP 22s
reading: it maps internal port 27017 of each pod to the physical node address to port 32151, 30616 and 30167
now the connection string would look like this:
mongodb://{IP-OF-THE-KUBE-NODE-1}:32151,{IP-OF-THE-KUBE-NODE-2}:30616,{IP-OF-THE-KUBE-NODE-3}:30167/
in case of minikube, you get the node ip with
$ minikube ip
192.168.99.100
and on a single node system, a connection string would be
mongodb://192.168.99.100:32151,192.168.99.100:30616,192.168.99.100:30167/
feel free to distribue the ips as you want with a cluster with multiple nodes, since the node port is available on every node and gets routed by k8s
@unguiculus
yeah we are willing. but creating multiple svc for a dynamic number of replica-pods seems to be a little tricky on the first sight. I will read deeper into the helm template engine... :)
in our special case, we built a service broker to interact between a PaaS and a detached k8s cluster (using helm and kubectl) - and it might be easier to invoke kubectl directly for this case...
@adrianliechti After the pods are exposed as NodePort there is still an issue connecting through a mongodb uri. All of the individual mongodb instances that are connected to show that the replicaset's config has an incorrect hostname. I have tried to rewrite these values but, there seems to be something overriding them back to these original values.
{ "_id" : "rs0", "version" : 3, "protocolVersion" : NumberLong(1), "members" : [ { "_id" : 0, "host" : "replica-mongodb-replicaset-0.replica-mongodb-replicaset.default.svc.cluster.local:27017", "arbiterOnly" : false, "buildIndexes" : true, "hidden" : false, "priority" : 1, "tags" : {
},
"slaveDelay" : NumberLong(0),
"votes" : 1
},
{
"_id" : 1,
"host" : "replica-mongodb-replicaset-1.replica-mongodb-replicaset.default.svc.cluster.local:27017",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 1,
"tags" : {
},
"slaveDelay" : NumberLong(0),
"votes" : 1
},
{
"_id" : 2,
"host" : "replica-mongodb-replicaset-2.replica-mongodb-replicaset.default.svc.cluster.local:27017",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 1,
"tags" : {
},
"slaveDelay" : NumberLong(0),
"votes" : 1
}
],
"settings" : {
"chainingAllowed" : true,
"heartbeatIntervalMillis" : 2000,
"heartbeatTimeoutSecs" : 10,
"electionTimeoutMillis" : 10000,
"catchUpTimeoutMillis" : 60000,
"getLastErrorModes" : {
},
"getLastErrorDefaults" : {
"w" : 1,
"wtimeout" : 0
},
"replicaSetId" : ObjectId("59839e61e5f61909903aabfb")
}
}
@alexwine36
at least we get closer :)
the hosts within the mongodb replicaset config seem to be okay to me. that's the endpoints of the members in the container network - and it's good if the master<->slave traffic is in that network. the mongodb cluster should not need to know about the port forwarding/external network stuff
which client or sdk do you use to connect to the replicaset? and how does your connection string looks like? does it contains all the single nodes and is the replicaset "rs0" specified? mongodb://..../?replicaSet=rs0
mongodb://..../?replicaSet=rs0
Yes, that would be the connect string with all the hosts in it.
@adrianliechti @unguiculus As far as Clients I use two different ones
The first is using Studio 3T
The Data URI is
mongodb://35.197.108.52:30039,35.197.85.94:30087,35.197.92.136:32085/?readPreference=nearest&replicaSet=rs0
The error that it gives me is this
Connection
failed.
SERVER [replica-mongodb-replicaset-1.replica-mongodb-replicaset.default.svc.cluster.local:27017] (Type: UNKNOWN) |_/ Connection error (MongoSocketOpenException): Exception opening socket |____/ Unknown host: replica-mongodb-replicaset-1.replica-mongodb-replicaset.default.svc.cluster.local
SERVER [replica-mongodb-replicaset-2.replica-mongodb-replicaset.default.svc.cluster.local:27017] (Type: UNKNOWN) |_/ Connection error (MongoSocketOpenException): Exception opening socket |____/ Unknown host: replica-mongodb-replicaset-2.replica-mongodb-replicaset.default.svc.cluster.local
Details: Timed out after 30000 ms while waiting for a server that matches ReadPreferenceServerSelector{readPreference=primary}. Client view of cluster state is {type=REPLICA_SET, servers=[{address=replica-mongodb-replicaset-1.replica-mongodb-replicaset.default.svc.cluster.local:27017, type=UNKNOWN, state=CONNECTING, exception={com.mongodb.MongoSocketOpenException: Exception opening socket}, caused by {java.net.UnknownHostException: replica-mongodb-replicaset-1.replica-mongodb-replicaset.default.svc.cluster.local}}, {address=replica-mongodb-replicaset-2.replica-mongodb-replicaset.default.svc.cluster.local:27017, type=UNKNOWN, state=CONNECTING, exception={com.mongodb.MongoSocketOpenException: Exception opening socket}, caused by {java.net.UnknownHostException: replica-mongodb-replicaset-2.replica-mongodb-replicaset.default.svc.cluster.local}}]
The other connection I have tried is through Meteor.js which gives me an error like this
For this one I also tried this URI
mongodb://35.197.108.52:30039,35.197.85.94:30087,35.197.92.136:32085/den?replicaSet=rs0
and
mongodb://35.197.108.52:30039,35.197.85.94:30087,35.197.92.136:32085/?readPreference=nearest&replicaSet=rs0
W20170814-11:35:13.829(-6)? (STDERR) MongoError: failed to connect to serv er [replica-mongodb-replicaset-1.replica-mongodb-replicaset.default.svc.cl uster.local:27017] on first connect [MongoError: getaddrinfo ENOTFOUND rep lica-mongodb-replicaset-1.replica-mongodb-replicaset.default.svc.cluster.l ocal replica-mongodb-replicaset-1.replica-mongodb-replicaset.default.svc.c luster.local:27017]
@alexwine36 please take a look at #1743, I think it will solve the issue you are facing.
I am still getting these frequent errors
2017-08-15T15:41:31.008+0000 I NETWORK [ReplicaSetMonitor-TaskExecutor-0] getaddrinfo("messy-hydra-mongodb-replicaset-1.messy-hydra-mongodb-replicaset.default.svc.cluster.local") failed: Name or service not known
2017-08-15T15:42:01.011+0000 I NETWORK [ReplicaSetMonitor-TaskExecutor-0] getaddrinfo("messy-hydra-mongodb-replicaset-1.messy-hydra-mongodb-replicaset.default.svc.cluster.local") failed: Name or service not known
2017-08-15T15:42:01.013+0000 I NETWORK [ReplicaSetMonitor-TaskExecutor-0] getaddrinfo("messy-hydra-mongodb-replicaset-2.messy-hydra-mongodb-replicaset.default.svc.cluster.local") failed: Name or service not known
But I think they are related to Kubernetes DNS not to the Helm chart.
Hey @alexwine36, I ran into the same issue as you. Did you find any resolution?
I can reach the cluster from my mongo client, but once I'm connected, it appears to be trying to use member hosts that I cannot resolve.. names internal to the cluster.
Mongo docs say:
Always use resolvable hostnames for the value of the members[n].host field in the replica set configuration to avoid confusion and complexity.
https://docs.mongodb.com/v3.2/tutorial/change-hostnames-in-a-replica-set/
.. and I might even be able to make it work with custom dns entries, but it's also using the default 27017.
So if I read this correct you guys are saying it should give my client all the replicaset nodes in the connection url? This feel dirty, this would make scaling impossible without editing the config of all my clients.
auto_mongodb_url=mongodb://${CI_ENVIRONMENT_SLUG}-mongodb-replicaset:27017/${MONGODB_DB}?replicaSet=rs0
would become
auto_mongodb_url=mongodb://${CI_ENVIRONMENT_SLUG}-mongodb-replicaset-0.${CI_ENVIRONMENT_SLUG}-mongodb-replicaset.${KUBE_NAMESPACE}.svc.cluster.local:27017,${CI_ENVIRONMENT_SLUG}-mongodb-replicaset-1.${CI_ENVIRONMENT_SLUG}-mongodb-replicaset.${KUBE_NAMESPACE}.svc.cluster.local:27017,${CI_ENVIRONMENT_SLUG}-mongodb-replicaset-2.${CI_ENVIRONMENT_SLUG}-mongodb-replicaset.${KUBE_NAMESPACE}.svc.cluster.local:27017/${MONGODB_DB}?replicaSet=rs0
Secondly I can't pass this along to Helm easily
Error: failed parsing --set data: key "local:27017" has no value (cannot end with ,)
apiVersion: v1
kind: Service
metadata:
name: mongodb
namespace: mongodb
labels:
release: mongodb
annotations: {}
spec:
ports:
- protocol: TCP
port: 27017
targetPort: 27017
selector:
release: mongodb
type: LoadBalancer
Using labels can handle this automatically.
EDIT: has been working great for us!
Hi @carldanley, this sounds good :)
What does your external client's URL look like when you try to connect to the MongoDB replicaset on the k8s cluster?
Thanks!
Hey ! same issue, solution ? :(
@carldanley That is what we tried but there is no way to force the load balancer to the master it will randomly select servers. The loadbalancer will only seem to work if you have a single mongodb instance. Example where 10.21.67.240
is the ip given from our loadbalancer endpoint:
awilhelm@MBP ~$ mongo --host 10.21.67.240 --port 27017
MongoDB shell version v3.6.3
connecting to: mongodb://10.21.67.240:27017/
MongoDB server version: 3.6.3
MainRepSet:PRIMARY> ^C
bye
awilhelm@MBP ~$ mongo --host 10.21.67.240 --port 27017
MongoDB shell version v3.6.3
connecting to: mongodb://10.21.67.240:27017/
MongoDB server version: 3.6.3
MainRepSet:SECONDARY> ^C
bye
awilhelm@MBP ~$ mongo --host 10.21.67.240 --port 27017
MongoDB shell version v3.6.3
connecting to: mongodb://10.21.67.240:27017/
MongoDB server version: 3.6.3
MainRepSet:PRIMARY> ^C
bye
awilhelm@MBP ~$ mongo --host 10.21.67.240 --port 27017
MongoDB shell version v3.6.3
connecting to: mongodb://10.21.67.240:27017/
MongoDB server version: 3.6.3
MainRepSet:SECONDARY>
Also the exposing of NodePort doesn't work on a distributed cluster because it appears that the client and server try to communicate using the k8s internal dns names that are not routeable.
I expose the Stateful set pods:
service "mongod-0" exposed
$ kubectl expose pod mongod-1 --type=NodePort
service "mongod-1" exposed
$ kubectl expose pod mongod-2 --type=NodePort
service "mongod-2" exposed
I then attempt to connect using the IP of a node in my cluster from my laptop. The connection happens but then it tries to switch over to using internal DNS names and my connection dies. :
$ mongo mongodb://10.21.67.1:30446,10.21.67.1:32234,10.21.67.1:31456/?replicaSet=MainRepSet
MongoDB shell version v3.6.3
connecting to: mongodb://10.21.67.1:30446,10.21.67.1:32234,10.21.67.1:31456/?replicaSet=MainRepSet
2018-03-07T14:26:12.753-0500 I NETWORK [thread1] Starting new replica set monitor for MainRepSet/10.21.67.1:30446,10.21.67.1:32234,10.21.67.1:31456
2018-03-07T14:26:12.903-0500 I NETWORK [ReplicaSetMonitor-TaskExecutor-0] Successfully connected to 10.21.67.1:30446 (1 connections now open to 10.21.67.1:30446 with a 5 second timeout)
2018-03-07T14:26:12.912-0500 I NETWORK [thread1] Successfully connected to 10.21.67.1:31456 (1 connections now open to 10.21.67.1:31456 with a 5 second timeout)
2018-03-07T14:26:13.160-0500 I NETWORK [thread1] Successfully connected to 10.21.67.1:32234 (1 connections now open to 10.21.67.1:32234 with a 5 second timeout)
2018-03-07T14:26:13.239-0500 I NETWORK [thread1] changing hosts to MainRepSet/mongod-0.mongodb-service.default.svc.cluster.local:27017,mongod-1.mongodb-service.default.svc.cluster.local:27017,mongod-2.mongodb-service.default.svc.cluster.local:27017 from MainRepSet/10.21.67.1:30446,10.21.67.1:31456,10.21.67.1:32234
2018-03-07T14:26:17.979-0500 I NETWORK [ReplicaSetMonitor-TaskExecutor-0] getaddrinfo("mongod-1.mongodb-service.default.svc.cluster.local") failed: nodename nor servname provided, or not known
2018-03-07T14:26:18.239-0500 I NETWORK [thread1] getaddrinfo("mongod-2.mongodb-service.default.svc.cluster.local") failed: nodename nor servname provided, or not known
2018-03-07T14:26:22.982-0500 I NETWORK [ReplicaSetMonitor-TaskExecutor-0] getaddrinfo("mongod-0.mongodb-service.default.svc.cluster.local") failed: nodename nor servname provided, or not known
2018-03-07T14:26:28.486-0500 I NETWORK [thread1] getaddrinfo("mongod-2.mongodb-service.default.svc.cluster.local") failed: nodename nor servname provided, or not known
2018-03-07T14:26:33.490-0500 I NETWORK [thread1] getaddrinfo("mongod-1.mongodb-service.default.svc.cluster.local") failed: nodename nor servname provided, or not known
^C2018-03-07T14:26:37.292-0500 E - [main] Error saving history file: FileOpenFailed: Unable to open() file : No such file or directory
2018-03-07T14:26:37.292-0500 I CONTROL [main] shutting down with code:0
This works fine from inside a pod in the k8s cluster:
$ kubectl exec -it mongod-0 bash
root@mongod-0:/# mongo mongodb://10.21.67.1:30446,10.21.67.1:32234,10.21.67.1:31456/?replicaSet=MainRepSet
MongoDB shell version v3.6.3
connecting to: mongodb://10.21.67.1:30446,10.21.67.1:32234,10.21.67.1:31456/?replicaSet=MainRepSet
2018-03-07T19:29:42.722+0000 I NETWORK [thread1] Starting new replica set monitor for MainRepSet/10.21.67.1:30446,10.21.67.1:32234,10.21.67.1:31456
2018-03-07T19:29:42.724+0000 I NETWORK [thread1] Successfully connected to 10.21.67.1:31456 (1 connections now open to 10.21.67.1:31456 with a 5 second timeout)
2018-03-07T19:29:42.725+0000 I NETWORK [ReplicaSetMonitor-TaskExecutor-0] Successfully connected to 10.21.67.1:30446 (1 connections now open to 10.21.67.1:30446 with a 5 second timeout)
2018-03-07T19:29:42.727+0000 I NETWORK [ReplicaSetMonitor-TaskExecutor-0] Successfully connected to 10.21.67.1:32234 (1 connections now open to 10.21.67.1:32234 with a 5 second timeout)
2018-03-07T19:29:42.727+0000 I NETWORK [thread1] Successfully connected to mongod-1.mongodb-service.default.svc.cluster.local:27017 (1 connections now open to mongod-1.mongodb-service.default.svc.cluster.local:27017 with a 5 second timeout)
2018-03-07T19:29:42.728+0000 I NETWORK [ReplicaSetMonitor-TaskExecutor-0] changing hosts to MainRepSet/mongod-0.mongodb-service.default.svc.cluster.local:27017,mongod-1.mongodb-service.default.svc.cluster.local:27017,mongod-2.mongodb-service.default.svc.cluster.local:27017 from MainRepSet/10.21.67.1:30446,10.21.67.1:31456,10.21.67.1:32234
2018-03-07T19:29:42.730+0000 I NETWORK [ReplicaSetMonitor-TaskExecutor-0] Successfully connected to mongod-2.mongodb-service.default.svc.cluster.local:27017 (1 connections now open to mongod-2.mongodb-service.default.svc.cluster.local:27017 with a 5 second timeout)
2018-03-07T19:29:42.731+0000 I NETWORK [ReplicaSetMonitor-TaskExecutor-0] Successfully connected to mongod-0.mongodb-service.default.svc.cluster.local:27017 (1 connections now open to mongod-0.mongodb-service.default.svc.cluster.local:27017 with a 5 second timeout)
MongoDB server version: 3.6.3
Welcome to the MongoDB shell.
For interactive help, type "help".
For more comprehensive documentation, see
http://docs.mongodb.org/
Questions? Try the support group
http://groups.google.com/group/mongodb-user
MainRepSet:PRIMARY>
Any help here would be great!
What I tend to do is if I need access to the db, you can port foward to the pod..
kubectl port-forward mongodb-0 27017
Then you can connect via localhost:27017
Ya that works for temporary things and testing but for a more robust production system it isn't going to work.
We ended up implementing shards + mongoS and this actually solved the problem as we can create a LB end point to MongoS and connect through that externally to the sharded backend.
I went through the following guide to help prove this concept:
http://pauldone.blogspot.com/2017/07/sharded-mongodb-kubernetes.html
Instead of exposing each replicaset individually you can also just create a second (congruent) public service using the kube-proxy as usual (if you need it)
{
"kind": "Service",
"apiVersion": "v1",
"metadata": {
"name": "mongodb-mongodb-replicaset-public",
"namespace": "default",
"labels": {
"app": "mongodb-replicaset",
"chart": "mongodb-replicaset-3.1.0",
"heritage": "Tiller",
"release": "mongodb"
},
"annotations": {
"service.alpha.kubernetes.io/tolerate-unready-endpoints": "true"
}
},
"spec": {
"ports": [
{
"name": "peer",
"protocol": "TCP",
"port": 27017,
"targetPort": 27017
}
],
"selector": {
"app": "mongodb-replicaset",
"release": "mongodb"
},
"nodePort": "30300",
"type": "NodePort",
"sessionAffinity": "None"
}
}
Keycloak chart does it like this for instance. One headless service and optionally a ClusterIp or NodePort service for public access if needed.
@Flowkap This doesn't solve the problem. Because of the way you need to access a replica set in mongodb to determine the master you can't connect through a NodePort exposed service. This is because once you connect through the connection gets switch over to using internal kubernetes namespaces:
example
$ mongo mongodb://10.21.61.1:30080/?replicaSet=rs0
MongoDB shell version v3.6.3
connecting to: mongodb://10.21.61.1:30080/?replicaSet=rs0
2018-04-27T13:47:28.009-0400 I NETWORK [thread1] Starting new replica set monitor for rs0/10.21.61.1:30080
2018-04-27T13:47:28.152-0400 I NETWORK [thread1] Successfully connected to 10.21.61.1:30080 (1 connections now open to 10.21.61.1:30080 with a 5 second timeout)
2018-04-27T13:47:33.227-0400 I NETWORK [thread1] getaddrinfo("sonar-mongodb-replicaset-0.sonar-mongodb-replicaset.default.svc.cluster.local") failed: nodename nor servname provided, or not known
2018-04-27T13:47:38.233-0400 I NETWORK [thread1] getaddrinfo("sonar-mongodb-replicaset-2.sonar-mongodb-replicaset.default.svc.cluster.local") failed: nodename nor servname provided, or not known
2018-04-27T13:47:38.233-0400 I NETWORK [ReplicaSetMonitor-TaskExecutor-0] getaddrinfo("sonar-mongodb-replicaset-1.sonar-mongodb-replicaset.default.svc.cluster.local") failed: nodename nor servname provided, or not known
2018-04-27T13:47:38.233-0400 W NETWORK [ReplicaSetMonitor-TaskExecutor-0] Unable to reach primary for set rs0
2018-04-27T13:47:43.318-0400 I NETWORK [thread1] getaddrinfo("sonar-mongodb-replicaset-0.sonar-mongodb-replicaset.default.svc.cluster.local") failed: nodename nor servname provided, or not known
2018-04-27T13:47:48.321-0400 I NETWORK [thread1] getaddrinfo("sonar-mongodb-replicaset-2.sonar-mongodb-replicaset.default.svc.cluster.local") failed: nodename nor servname provided, or not known
2018-04-27T13:47:53.327-0400 I NETWORK [thread1] getaddrinfo("sonar-mongodb-replicaset-1.sonar-mongodb-replicaset.default.svc.cluster.local") failed: nodename nor servname provided, or not known
2018-04-27T13:47:53.327-0400 W NETWORK [thread1] Unable to reach primary for set rs0
2018-04-27T13:47:53.327-0400 E QUERY [thread1] Error: Could not find host matching read preference { mode: "primary", tags: [ {} ] } for set rs0 :
connect@src/mongo/shell/mongo.js:251:13
@(connect):1:6
exception: connect failed
If you connect to mongodb without the replica set URL you get put on a random mongodb instance and can not be sure you are on master:
example:
awilhelm@MBP ~ $ mongo --host 10.21.61.1 --port 30080
MongoDB shell version v3.6.3
connecting to: mongodb://10.21.61.1:30080/
MongoDB server version: 3.6.4
rs0:SECONDARY> ^C
bye
awilhelm@MBP ~ $ mongo --host 10.21.61.1 --port 30080
MongoDB shell version v3.6.3
connecting to: mongodb://10.21.61.1:30080/
MongoDB server version: 3.6.4
rs0:PRIMARY> ^C
bye
awilhelm@MBP ~ $ mongo --host 10.21.61.1 --port 30080
MongoDB shell version v3.6.3
connecting to: mongodb://10.21.61.1:30080/
MongoDB server version: 3.6.4
rs0:SECONDARY> ^C
bye
awilhelm@MBP ~ $ mongo --host 10.21.61.1 --port 30080
MongoDB shell version v3.6.3
connecting to: mongodb://10.21.61.1:30080/
MongoDB server version: 3.6.4
rs0:PRIMARY> ^C
bye
I am facing the same issue here trying to expose my replica set. Has anyone gotten ant better results or at least some auto ip discovery?
We did a workaround to be able to connect Studio3T to the replica set in our Network.
I also use external-dns Helm to update the DNS zone (hosted in CloudFlare) to bind to the right Node. It should not change but if we recreate all POD and container start on another Node they will get updated in the DNS. Could be great to develop some thing to update the exposed port too but it's a great workaround for now.
@adrianliechti How would you get the external IP for nodes? Thanks in advance
@PavelYarysh If you are asking about my mongoS solution above and are running on a hardware based system. We use https://metallb.universe.tf/ to give external IP address. MetalLB is easy to setup and works like a charm!
I had an excellent suggestion from a Jody (can't see how to link slack name to github name for direct @) on the kubernetes slack channel.
The suggestion is to fix this issue with deploying Replica Sets in kubernetes by deploying a side car system along with the replica set that will query the MongoDB replica's to see which is primary, take that information and update the pod that is primary with a label denoting it as the primary. Subsequently cleaning the old pod of the label.
We also in tandem create a primary specific service that will point only to the given primary label above.
This is a bit of a tall order but it does seem to give an elegant solution to the problem.
I do not have the time to attempt a PR but I will hopefully be able to set some aside in the coming weeks to investigate this solution.
Requirements:
Flag to enable external master service management (in helm)
Define service requirements (how to expose externally either NodePort or External LB)
Simple query to identify Primary node
API syntax to update and remove labels in kubernetes
A script to run the primary node query and execute required label updates
Edit: mixed up what is updated, fixed. plus requirements list
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.
A possible solution that I have been visualizing would be to use a Network Plugin with BGP support (for example, Calico, kube-router) to ensure external connectivity to each node in the Replica Set.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.
so any elegant solution to this ?
@paltaa no, not yet Just wait until someone can manage to create a PR
The suggestion from @aba182 is quite exciting if done
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.
Here's how we made this work, without making any changes to the mongodb-replicaset
chart. This should work also for the mongodb
chart.
The relevant issue is clearly stated here:
Thus, replica set members must be reachable from the client by the hostnames listed in the replica set config.
There is no ready way that hostnames from one Kubernetes cluster will be resolvable from outside that cluster, not least from another cluster. Since this is a DNS resolution issue, we used DNS to solve it. YMMV and there may be other solutions that are better suited to your situation.
We run apps in one cluster and MongoDb in another cluster, so our client apps need to connect to a MongoDb replica set running a different cluster.
MongoDb cluster steps:
Service
for each Pod
in the MongoDb StatefulSet
.Ingress
, with an external hostname for each internal Pod
Service
. The Ingress Controller must support multiple TCP
services. We used Voyager, since it has this capability. However, each MongoDb Service
needs to listen on a different port. We used: 27017, 27018, and 27019
, with our 3 node replica set.We wrote some helm charts to generate all these manifests, including the appropriate hostnames and port allocations.
Client cluster steps:
ExternalName
Service
to simplify this.We use CoreDNS
for cluster DNS, which provides an easy way to do this using the rewrite plugin.
Here is an example of a modified CoreDNS
configuration for the client cluster:
apiVersion: v1
kind: ConfigMap
metadata:
name: coredns
namespace: kube-system
labels:
addonmanager.kubernetes.io/mode: EnsureExists
data:
Corefile: |-
.:53 {
errors
log
health
rewrite stop {
name regex mongodb.svc.cluster.local mongodb.example.com
answer name mongodb.example.com mongodb.svc.cluster.local
}
kubernetes cluster.local. in-addr.arpa ip6.arpa {
pods insecure
upstream
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
proxy . /etc/resolv.conf
cache 30
reload
}
Please note that mongodb.svc.cluster.local
and mongodb.example.com
are not real DNS names. In reality, these are regular expressions that will match all possible internal hostnames for Pods in the MongoDb StatefulSet
and use capture groups to generate the relevant request and answer sections of the DNS query. See the rewrite plugin docs for plenty of examples.
From a Pod
running in the client cluster, using nslookup
or dig
on the internal hostnames shows that these hostnames actually map to IP addresses associated with the external hostnames used in the Ingress
resources created earlier, thus allowing clients to connect (assuming the two external networks are peered in some way). The rewrite rule is applied before any local cluster DNS resolution takes place. This approach works great for us!
If your clients are not running in a Kubernetes cluster, you should be able to use whatever DNS resolver mechanism at hand to do something similar.
I had an excellent suggestion from a Jody (can't see how to link slack name to github name for direct @) on the kubernetes slack channel.
The suggestion is to fix this issue with deploying Replica Sets in kubernetes by deploying a side car system along with the replica set that will query the MongoDB replica's to see which is primary, take that information and update the pod that is primary with a label denoting it as the primary. Subsequently cleaning the old pod of the label.
I have followed this nice suggestion and wrote some simple app in golang to do the job. You can find it here: https://github.com/combor/k8s-mongo-labeler-sidecar
Docker container can be found here: https://hub.docker.com/r/combor/k8s-mongo-labeler-sidecar
Contributions to improve it are very welcome ;)
The best solution I found was to use a Network Plugin with BGP support (for example, kube-router) to ensure external connectivity to each node in the Replica Set.
We can solve this if we make one small change to the helm chart and create a custom service. First, the helm chart just needs to set the 0th pod to have a higher priority in on-start.sh
so he is always elected as the primary. You can also do this manually by connecting to MongoDB and using rs.reconfig()
. Second, a custom service can be created to target just this primary pod like so:
apiVersion: v1
kind: Service
metadata:
name: mongodb-primary
spec:
type: LoadBalancer
externalTrafficPolicy: Local
selector:
app: mongodb-replicaset
release: my
statefulset.kubernetes.io/pod-name: my-mongodb-replicaset-0
ports:
- protocol: TCP
port: 27017
The externalTrafficPolicy=Local avoids an extra hop and skips nodes that don't have the pod. The statefulset.kubernetes.io/pod-name
will target a specific pod in the statefulset by its ordinal number. Does this sound reasonable @unguiculus ?
This doesn't sound like a viable solution. Even if you give pod 0 a higher priority, your master may go down due to a node outage, network partition, or simply a cluster update. Another replica will become the leader any you will lose access because the service won't work anymore. You should not target a specific replica only.
@unguiculus It's not ideal and I wouldn't recommend using it to connect clients in production. It's more for debugging externally using Robomongo and other tools that don't understand the mongo protocol or only connect to a single member. It's true that it won't work when the primary is down, but that is temporary and with the higher priority pod 0 will be re-elected as primary as soon as possible. This is similar to how the stable/mongodb chart handles the primary, except it actually has a separate service and statefulset for the primary.
For connecting with Robomongo you can just do kubectl port-forward
to the master.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.
This issue is being automatically closed due to inactivity.
@selmison did you get the kube-router to work with this particular mongodb-replicaset helm chart? if so I was curious as to how exactly
Hi, @kwill4026
I did it for advertised Node's POD cidr to the BGP peers with the args:
With this I have access to the replica-set-members's ips.
@selmison can you provide me with your email address if you don't mind. I wanted to ask you some other questions to make this easier.
@selmison which daemonset yaml file are you using in the daemonset directory to configure kube-router.
Hy, @kwill4026
apiVersion: apps/v1beta2
kind: DaemonSet
metadata:
labels:
k8s-app: kube-router
tier: node
name: kube-router
namespace: kube-system
spec:
selector:
matchLabels:
k8s-app: kube-router
tier: node
template:
metadata:
labels:
k8s-app: kube-router
tier: node
spec:
containers:
- args:
- --cluster-asn=64512
- --peer-router-asns=64513
- --peer-router-ips=10.x.X.X
- --run-router=true
- --run-firewall=true
- --run-service-proxy=false
env:
- name: NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
image: cloudnativelabs/kube-router:v0.2.5
imagePullPolicy: Always
livenessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 20244
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 3
successThreshold: 1
timeoutSeconds: 1
name: kube-router
resources:
requests:
cpu: 250m
memory: 250Mi
securityContext:
capabilities: {}
privileged: true
procMount: Default
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /lib/modules
name: lib-modules
readOnly: true
- mountPath: /etc/cni/net.d
name: cni-conf-dir
- mountPath: /var/lib/kube-router/kubeconfig
name: kubeconfig
readOnly: true
dnsPolicy: ClusterFirst
hostNetwork: true
initContainers:
- command:
- /bin/sh
- -c
- set -e -x; if [ ! -f /etc/cni/net.d/10-kuberouter.conf ]; then TMP=/etc/cni/net.d/.tmp-kuberouter-cfg;
cp /etc/kube-router/cni-conf.json ${TMP}; mv ${TMP} /etc/cni/net.d/10-kuberouter.conf;
fi
image: busybox
imagePullPolicy: Always
name: install-cni
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/cni/net.d
name: cni-conf-dir
- mountPath: /etc/kube-router
name: kube-router-cfg
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: kube-router
serviceAccountName: kube-router
terminationGracePeriodSeconds: 30
tolerations:
- key: CriticalAddonsOnly
operator: Exists
- effect: NoSchedule
key: node-role.kubernetes.io/master
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/not-ready
operator: Exists
volumes:
- hostPath:
path: /lib/modules
type: ""
name: lib-modules
- hostPath:
path: /etc/cni/net.d
type: ""
name: cni-conf-dir
- configMap:
defaultMode: 420
name: kube-router-cfg
name: kube-router-cfg
- hostPath:
path: /var/lib/kube-router/kubeconfig
type: ""
name: kubeconfig
updateStrategy:
type: OnDelete
thanks @selmison. how did you determine the values for
I know for peer-router-ips that is the ip address for the external router which in my case would be the k8s service I set up to be a nodePort or LoadBalancer.
Hi, @kwill4026,
Each node kube-router can be configured with external BGP router. More details:
https://cloudnativelabs.github.io/post/2017-05-22-kube-pod-networking/ https://github.com/cloudnativelabs/kube-router/blob/master/docs/bgp.md
Question
I have only recently started playing with kubernetes and helm and I'm having trouble with the mongodb-replicaset chart.
My goal was to create this cluster and then be able to access it on other servers that I have that need MongoDB.
I have tried to expose the Replica Set with a LoadBalancer but, it doesn't always return the primary node and, I have tried using ingresses but that also got me nowhere.
I feel like there has to be something I am missing. I am just trying to create a MongoDB replica set that is externally available to other sites that I run and connect to with a Mongo URI
Thanks in advance for the help!