neo4j-contrib / kubernetes-neo4j

(RETIRED) Kubernetes experiments with Neo4j. See updated Helm Repo
https://github.com/neo4j-contrib/neo4j-helm
60 stars 26 forks source link

Cores stuck in connecting to other members #15

Closed hoptical closed 4 years ago

hoptical commented 6 years ago

Hi, I used the readme documentation line by line to implement core pods of neo4j clustering. But it sticks at connecting to other members and cannot start:

$ kubectl logs neo4j-core-0

++ hostname -f
+ export NEO4J_dbms_connectors_default__advertised__address=neo4j-core-0.neo4j.default.svc.cluster.local
+ NEO4J_dbms_connectors_default__advertised__address=neo4j-core-0.neo4j.default.svc.cluster.local
++ hostname -f
+ export NEO4J_causal__clustering_discovery__advertised__address=neo4j-core-0.neo4j.default.svc.cluster.local:5000
+ NEO4J_causal__clustering_discovery__advertised__address=neo4j-core-0.neo4j.default.svc.cluster.local:5000
++ hostname -f
+ export NEO4J_causal__clustering_transaction__advertised__address=neo4j-core-0.neo4j.default.svc.cluster.local:6000
+ NEO4J_causal__clustering_transaction__advertised__address=neo4j-core-0.neo4j.default.svc.cluster.local:6000
++ hostname -f
+ export NEO4J_causal__clustering_raft__advertised__address=neo4j-core-0.neo4j.default.svc.cluster.local:7000
+ NEO4J_causal__clustering_raft__advertised__address=neo4j-core-0.neo4j.default.svc.cluster.local:7000
+ exec /docker-entrypoint.sh neo4j
Active database: graph.db
Directories in use:
  home:         /var/lib/neo4j
  config:       /var/lib/neo4j/conf
  logs:         /var/lib/neo4j/logs
  plugins:      /var/lib/neo4j/plugins
  import:       /var/lib/neo4j/import
  data:         /var/lib/neo4j/data
  certificates: /var/lib/neo4j/certificates
  run:          /var/lib/neo4j/run
Starting Neo4j.
2018-10-29 14:38:01.268+0000 INFO  ======== Neo4j 3.3.2 ========
2018-10-29 14:38:01.359+0000 INFO  Starting...
2018-10-29 14:38:03.680+0000 INFO  Bolt enabled on 0.0.0.0:7687.
2018-10-29 14:38:03.715+0000 INFO  Initiating metrics...
2018-10-29 14:38:04.021+0000 INFO  Resolved initial host 'neo4j.neo4j.svc.cluster.local:5000' to []
2018-10-29 14:38:04.095+0000 INFO  My connection info: [
    Discovery:   listen=0.0.0.0:5000, advertised=neo4j-core-0.neo4j.default.svc.cluster.local:5000,
    Transaction: listen=0.0.0.0:6000, advertised=neo4j-core-0.neo4j.default.svc.cluster.local:6000, 
    Raft:        listen=0.0.0.0:7000, advertised=neo4j-core-0.neo4j.default.svc.cluster.local:7000, 
    Client Connector Addresses: bolt://neo4j-core-0.neo4j.default.svc.cluster.local:7687,http://neo4j-core-0.neo4j.default.svc.cluster.local:7474,https://neo4j-core-0.neo4j.default.svc.cluster.local:7473
]
2018-10-29 14:38:04.095+0000 INFO  Discovering cluster with initial members: [neo4j.neo4j.svc.cluster.local:5000]
2018-10-29 14:38:04.095+0000 INFO  Attempting to connect to the other cluster members before continuing...

It stays at last line and doesn't continue and even doesn't throw a failure error! What should I do? Thanks.

SuhaMaj commented 5 years ago

I am facing the same issue.

JannikZed commented 5 years ago

same for me in kubernetes

kobyepistema commented 5 years ago

I'm facing the same thing... (It's not always stuck)

JannikZed commented 5 years ago

So I solved the issue the following way: Use this helm chart: https://github.com/neo-technology/neo4j-google-k8s-marketplace/tree/3.5/chart . Forget about this chart here ...
I downloaded the google-marketplace chart, edited the values according to my environment and could successfully start a causal cluster in the neo4j namespace. We also used our own docker images that had some plugins installed.

majidrajabi commented 5 years ago

The problem you are facing with that, is related to DNS. The cores trying to find each other by the neo4j.neo4j.svc.cluster.local if it deployed in neo4j namespace which is failed. The steps you can find the issue can be like below:

  1. see if you can resolve the kubernetes service with : $ kubectl exec -ti busybox -- nslookup kubernetes.default If DNS is working correctly, the command returns a response like the following example: Server: 10.96.0.10 Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local Name: kubernetes.default Address 1: 10.96.0.1 kubernetes.default.svc.cluster.local If DNS do not work correctly this link can be helpful [https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/]
  2. If the DNS is OK and the problem still persisted, check the health of network plugin pods by : kubectl get pod -n kube-system and kubectl logs <the name of the network plugin> . You probably see some issue with that.
  3. If everything including DNS and networking are OK, try deploying neo4j cluster in a namespace other than default and change the line in the advertise host name to : neo4j..svc.cluster.local
moxious commented 4 years ago

Folks if you're arriving at this repo, please be aware that it is unmaintained and won't be maintained again. Check the message on the front page README of the repo.

Neo4j offers two options for running on kubernetes -- the public helm chart and the GKE marketplace option.

Public Helm Chart (should be suitable for most kubernetes, but may require tweaking) - https://github.com/helm/charts/tree/master/stable/neo4j

Google Kubernetes Marketplace - https://github.com/neo-technology/neo4j-google-k8s-marketplace

Questions can also be put on community.neo4j.com.