graphfoundation / ongdb

ONgDB is an independent fork of Neo4j® Enterprise Edition version 3.4.0.rc02 licensed under AGPLv3 and/or Community Edition licensed under GPLv3
https://www.graphfoundation.org/projects/ongdb/
380 stars 57 forks source link

I encountered a problem with 3.6.0.M1.The two nodes could not start properly.But 3.5.16 is ok. #28

Closed crazyyanchao closed 4 years ago

crazyyanchao commented 4 years ago

I encountered a problem with 3.6.0.M1.The two nodes could not start properly.But 3.5.16 is ok. node-1 neo4j.conf

dbms.connectors.default_listen_address=0.0.0.0
dbms.connectors.default_advertised_address=node-1
dbms.connector.bolt.enabled=true
dbms.connector.bolt.listen_address=:7687
dbms.connector.http.enabled=true
dbms.connector.http.listen_address=:7474
dbms.connector.https.enabled=true
dbms.connector.https.listen_address=:7473
dbms.mode=CORE
causal_clustering.minimum_core_cluster_size_at_formation=2
causal_clustering.minimum_core_cluster_size_at_runtime=2
causal_clustering.initial_discovery_members=node-1:5000,node-2:5001
causal_clustering.discovery_listen_address=:5000
dbms.jvm.additional=-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005

node-2 neo4j.conf

dbms.connectors.default_listen_address=0.0.0.0
dbms.connectors.default_advertised_address=node-2
dbms.connector.bolt.enabled=true
dbms.connector.bolt.listen_address=:7687
dbms.connector.http.enabled=true
dbms.connector.http.listen_address=:7474
dbms.connector.https.enabled=true
dbms.connector.https.listen_address=:7473
dbms.mode=CORE
causal_clustering.minimum_core_cluster_size_at_formation=2
causal_clustering.minimum_core_cluster_size_at_runtime=2
causal_clustering.initial_discovery_members=node-1:5000,node-2:5001
causal_clustering.discovery_listen_address=:5001
dbms.jvm.additional=-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005

error.log


2020-04-05 10:17:43.964+0000 INFO  Waiting for a total of 2 core members...
2020-04-05 10:17:56.855+0000 ERROR Failed to start Neo4j: Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase@6569dded' was successfully initialized, but failed to start. Please see the attached cause exception "Failed to join a cluster with members {clusterId=null, bootstrappable=true, coreMembers={MemberId{1b03f597}=CoreServerInfo{raftServer=node-2:7000, catchupServer=node-2:6000, clientConnectorAddresses=bolt://node-2:7687,http://node-2:7474,https://node-2:7473, groups=[], database=default, refuseToBeLeader=false}}}. Another member should have published a clusterId but none was detected. Please restart the cluster.". Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase@6569dded' was successfully initialized, but failed to start. Please see the attached cause exception "Failed to join a cluster with members {clusterId=null, bootstrappable=true, coreMembers={MemberId{1b03f597}=CoreServerInfo{raftServer=node-2:7000, catchupServer=node-2:6000, clientConnectorAddresses=bolt://node-2:7687,http://node-2:7474,https://node-2:7473, groups=[], database=default, refuseToBeLeader=false}}}. Another member should have published a clusterId but none was detected. Please restart the cluster.".
org.neo4j.server.ServerStartupException: Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase@6569dded' was successfully initialized, but failed to start. Please see the attached cause exception "Failed to join a cluster with members {clusterId=null, bootstrappable=true, coreMembers={MemberId{1b03f597}=CoreServerInfo{raftServer=node-2:7000, catchupServer=node-2:6000, clientConnectorAddresses=bolt://node-2:7687,http://node-2:7474,https://node-2:7473, groups=[], database=default, refuseToBeLeader=false}}}. Another member should have published a clusterId but none was detected. Please restart the cluster.".
at org.neo4j.server.exception.ServerStartupErrors.translateToServerStartupError(ServerStartupErrors.java:45)
at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:187)
at org.neo4j.server.ServerBootstrapper.start(ServerBootstrapper.java:124)
at org.neo4j.server.ServerBootstrapper.start(ServerBootstrapper.java:91)
at org.neo4j.server.enterprise.EnterpriseEntryPoint.main(EnterpriseEntryPoint.java:41)
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.server.database.LifecycleManagingDatabase@6569dded' was successfully initialized, but failed to start. Please see the attached cause exception "Failed to join a cluster with members {clusterId=null, bootstrappable=true, coreMembers={MemberId{1b03f597}=CoreServerInfo{raftServer=node-2:7000, catchupServer=node-2:6000, clientConnectorAddresses=bolt://node-2:7687,http://node-2:7474,https://node-2:7473, groups=[], database=default, refuseToBeLeader=false}}}. Another member should have published a clusterId but none was detected. Please restart the cluster.".
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:473)
at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:111)
at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:180)
... 3 more
Caused by: java.lang.RuntimeException: Error starting org.neo4j.graphdb.facade.GraphDatabaseFacadeFactory, /home/ongdb/Desktop/ongdb-node-1/data/databases
at org.neo4j.graphdb.facade.GraphDatabaseFacadeFactory.initFacade(GraphDatabaseFacadeFactory.java:232)
at org.neo4j.causalclustering.core.OpenEnterpriseCoreGraphDatabase.<init>(OpenEnterpriseCoreGraphDatabase.java:52)
at org.neo4j.server.enterprise.OpenEnterpriseGraphFactory.newGraphDatabase(OpenEnterpriseGraphFactory.java:43)
at org.neo4j.server.database.LifecycleManagingDatabase.start(LifecycleManagingDatabase.java:90)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:452)
... 5 more
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.causalclustering.core.state.CoreLife@53667cbe' was successfully initialized, but failed to start. Please see the attached cause exception "Failed to join a cluster with members {clusterId=null, bootstrappable=true, coreMembers={MemberId{1b03f597}=CoreServerInfo{raftServer=node-2:7000, catchupServer=node-2:6000, clientConnectorAddresses=bolt://node-2:7687,http://node-2:7474,https://node-2:7473, groups=[], database=default, refuseToBeLeader=false}}}. Another member should have published a clusterId but none was detected. Please restart the cluster.".
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:473)
at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:111)
at org.neo4j.graphdb.facade.GraphDatabaseFacadeFactory.initFacade(GraphDatabaseFacadeFactory.java:227)
... 9 more
Caused by: java.util.concurrent.TimeoutException: Failed to join a cluster with members {clusterId=null, bootstrappable=true, coreMembers={MemberId{1b03f597}=CoreServerInfo{raftServer=node-2:7000, catchupServer=node-2:6000, clientConnectorAddresses=bolt://node-2:7687,http://node-2:7474,https://node-2:7473, groups=[], database=default, refuseToBeLeader=false}}}. Another member should have published a clusterId but none was detected. Please restart the cluster.
at org.neo4j.causalclustering.identity.ClusterBinder.bindToCluster(ClusterBinder.java:177)
at org.neo4j.causalclustering.core.state.CoreLife.start0(CoreLife.java:74)
at org.neo4j.kernel.lifecycle.SafeLifecycle.transition(SafeLifecycle.java:124)
at org.neo4j.kernel.lifecycle.SafeLifecycle.start(SafeLifecycle.java:138)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:452)
... 11 more
2020-04-05 10:17:56.858+0000 INFO  Neo4j Server shutdown initiated by request
bradnussbaum commented 4 years ago

@crazyyanchao Did you perform an upgrade on one of the nodes and then copy that upgrade node to be the database of the other nodes in the cluster?

crazyyanchao commented 4 years ago

@bradnussbaum Yes,I do this.But 3.5.16 is ok! Why is it happen?

bradnussbaum commented 4 years ago

@crazyyanchao Does it happen from a clean startup of 3.6.0.M1? After you upgraded in SINGLE mode from 3.5.16 to 3.6.0.M1 did you allow the node to come fully online? After the node came online and you shutdown cleanly, did you run neo4j-admin unbind to clear the cluster state and then neo4j-admin load to load the upgraded graph into each node?

crazyyanchao commented 4 years ago

@bradnussbaum:) Thanks for your reply. Yes,it happen from a clean startup of 3.6.0.M1.But I haven't try to run unbind and load.I will try this.But is that a necessary step for 3.6.*? Now I still shift to 3.5.16.It is ok.I will continue to test...