influxdata / influxdb

Scalable datastore for metrics, events, and real-time analytics
https://influxdata.com
Apache License 2.0
28.66k stars 3.54k forks source link

Clustering does not work / does nothing #517

Closed Crapworks closed 10 years ago

Crapworks commented 10 years ago

Hi all,

I've found influx while looking for an alternative for carbon. And I was especially interested in the clustering functionality. But anyhow, I can't get it to work.

I'm using the packaged version 0.6.1 on Debian Wheezy.

I have three nodes, all running with the standard configuration. The only thing I've changed is:

replication-factor = 2
seed-servers = ["node1:8090", "node2:8090", "node3:8090"]

Restarting the instances and looking at the web interface only show the local server. And pushing data to one of the nodes doesn't have any effects on the other nodes.

Is the something I've been missing during my setup?

Thanks for your help and time! Regards, Christian

jvshahid commented 10 years ago

Hey @Crapworks, thanks for giving InfluxDB a try. The mailing list is better suited for these kind of questions. Our user base is growing and it's usually more responsive. Anyway, in order to set up a cluster you need to do the following:

  1. Bring up a node with no seed-servers
  2. Start more nodes with seed-servers set to the first node (or previous nodes that you started up)

seed-servers is only relevant the first time you start a node. Please give it a spin and let us know if you're having any trouble.

Crapworks commented 10 years ago

Hi,

thank you for your quick answer! Since the service is started automatically when you install the debian package, this would be impossible to achieve :) Can I shutdown the service and delete ${MAGICFILE} to give me a chance to alter the configuration file for clustering?

Regards, Christian

jvshahid commented 10 years ago

Yes, you can delete /opt/influxdb/shared/data. The directory will be created as soon as you start InfluxDB again. There's a github issue to disable starting the service automatically on install #460

Crapworks commented 10 years ago

Ah, I see! That brought me a stop to the front! Thanks! Started up the first node without seeder and deleted data directory, then started up the second node with the first node as the seeder. Now the second node appears in the web gui! The last message is:

[2014/05/08 16:49:44 CEST] [INFO] (server.(*Server).ListenAndServe:87) Waiting for local server to be added

I hope that says that everything is fine. Anyways, now I'm bringing up the 3rd node with the first configured as the seeder I get tons of this messages:

[2014/05/08 16:53:10 CEST] [EROR] (coordinator.(*RaftServer).Join:583) Post http://ceph-mon-bs01.infra.server.lan:8090/join: net/http: timeout awaiting response headers
[2014/05/08 16:53:10 CEST] [WARN] (coordinator.(*RaftServer).startRaft:409) Couldn't join any of the seeds, sleeping and retrying...
[2014/05/08 16:53:10 CEST] [INFO] (coordinator.(*RaftServer).startRaft:401) (raft:e5e4b29) Attempting to join leader: ceph-mon-bs01.infra.server.lan:8090

Any Ideas? Regards, Christian

jvshahid commented 10 years ago

Make sure you set hostname in the config file to the proper hostnames that are reachable from the other nodes. Also make sure seed-servers uses a hostname that's reachable.

Crapworks commented 10 years ago

They are all reachable, the hostname configured on the second node as a seeder is the same as on the third node. Anyways, the whole cluster seem to hang now, no response from webgui, no logentries when sending data to the first node.

jvshahid commented 10 years ago

Did you set hostname:


# If hostname (on the OS) doesn't return a name that can be resolved by the other
# systems in the cluster, you'll have to set the hostname to an IP or something
# that can be resolved here.
# hostname = ""

on both nodes ?

Crapworks commented 10 years ago

Damn, I was totally missing this setting... Thanks! The cluster is up and working now with three nodes! Regards, Christian