tidwall / summitdb

In-memory NoSQL database with ACID transactions, Raft consensus, and Redis API
Other
1.41k stars 78 forks source link

Unable to join cluster #18

Closed quayilab closed 7 years ago

quayilab commented 7 years ago

Hi, I found this project is really interesting. However when I'm trying to create a cluster of two summitdb server, first server was okay. But then when I started the second one to join the first I was unable to get it work:

X:\>summitdb -p 7482 -join localhost:7481
9896:M 18 Mar 02:51:44.764 [1m*[0m SummitDB 0.4.0
9896:N 18 Mar 02:51:44.834 [1m*[0m Node at :7482 [33m[Follower][0m entering Follower state (Leader: "")
9896:N 18 Mar 02:51:45.880 [33m#[0m Heartbeat timeout from "" reached, starting
election
9896:N 18 Mar 02:51:45.880 [1m*[0m Node at :7482 [36m[Candidate][0m entering Candidate state
9896:N 18 Mar 02:51:45.927 [33m#[0m Failed to make RequestVote RPC to :7481: dial tcp :7481: connectex: The requested address is not valid in its context.
9896:N 18 Mar 02:51:47.449 [33m#[0m Election timeout reached, restarting election
... and so on

Can someone point me out what might I've been doing wrong? Thank You.

tidwall commented 7 years ago

Hi @quayi,

Try this,

In terminal 1:

$ summitdb-server -p 7481 -dir data1

Then in terminal 2:

$ summitdb-server -p 7482 -dir data2 -join localhost:7481

... and the same process for other nodes.

image

quayilab commented 7 years ago

Thank You @tidwall for the fast response. I tried the solution but still can't join the first. Here's the screenshot of what happening: summitdb

Fyi, I'm using Win10, Core i5, RAM 4gb in case You needed. Thank You.

tidwall commented 7 years ago

The second screenshot shows X:\> as the last line. Did the application quit automatically or did you close it?

quayilab commented 7 years ago

No, it was the application. It quitted after it echoing those message.

quayilab commented 7 years ago

This is a capture if I start the second server without ip addr on the -join argument.

X:\>summitdb-server -p 7482 -dir data2 -join :7481
6160:M 18 Mar 20:46:30.341 * SummitDB 0.4.0
6160:N 18 Mar 20:46:30.397 * Node at :7482 [Follower] entering Follower state (Leader: "")
6160:N 18 Mar 20:46:30.397 # failed to join node at :7481: dial tcp :7481: connectex: The requested address is not valid in its context.

But it also quitted after it prints the message. Here is the capture of first server:

X:\>summitdb-server -p 7481 -dir data1
5268:M 18 Mar 20:45:54.689 * SummitDB 0.4.0
5268:N 18 Mar 20:45:54.765 * Enable single node
5268:N 18 Mar 20:45:54.831 * Node at :7481 [Follower] entering Follower state (Leader: "")
5268:N 18 Mar 20:45:56.667 # Heartbeat timeout from "" reached, starting election
5268:N 18 Mar 20:45:56.667 * Node at :7481 [Candidate] entering Candidate state
5268:N 18 Mar 20:45:56.799 * Election won. Tally: 1
5268:N 18 Mar 20:45:56.799 * Node at :7481 [Leader] entering Leader state

On this first server, there's nothing happened, like it didn't even know that the second server tried to join it.

tidwall commented 7 years ago

I've never seen the connectex: The requested address is not valid in its context. error before. After some research it looks like this might be related to a this issue. I need to do some more investigating to make sure.

tidwall commented 7 years ago

I just pushed an update to the master branch that hopefully fixes this issue.

Make sure to delete the data1 and data2 directories.

Terminal 1:

$ summitdb-server -p 7481 -dir data1

Terminal 2:

$ summitdb-server -p 7482 -dir data2 -join localhost:7481

Let me know if this works.

quayilab commented 7 years ago

Hi @tidwall, your fix works great, here's a capture: summitdb-1

Thank You very much for taking time to solve the problem. I'll play around with the fix and I'll get back to You soon.

tidwall commented 7 years ago

@quayi That's great to hear. You're welcome.