influxdata / influxdb

Scalable datastore for metrics, events, and real-time analytics
https://influxdata.com
Apache License 2.0
28.96k stars 3.56k forks source link

How to setup an HA Influxdb ? #882

Closed toni-moreno closed 10 years ago

toni-moreno commented 10 years ago

We have been testing a Influxdb 3 node cluster with replication-factor=3 and node1 as seed as follows:

http://crapworks.de/blog/2014/05/18/influxdb-clustering/

It seems to work fine when all nodes are running and we can query data contained on any node even when the node2 or node3 are stopped/failing. I suppose this behavior is because of replication-factor.

But when node 1 ( the seed) is stopped or failing we can not query any data on the two other nodes. with the following error:

ERROR: No servers up to query shard 1

How can we configure a full HA infrastructure with influxdb ?

toni-moreno commented 10 years ago

I'm reviewing data and we have found files only in the node 1.


root@influxdb1:/opt/influxdb/shared/data# find db
db
db/shard_db_v2
db/shard_db_v2/00001
db/shard_db_v2/00001/IDENTITY
db/shard_db_v2/00001/LOG.old.1409215976228192
db/shard_db_v2/00001/000004.sst
db/shard_db_v2/00001/LOCK
db/shard_db_v2/00001/MANIFEST-000005
db/shard_db_v2/00001/000006.log
db/shard_db_v2/00001/type
db/shard_db_v2/00001/CURRENT
db/shard_db_v2/00001/LOG

root@influxdb2:/opt/influxdb/shared/data# find db
db
db/shard_db_v2

[ NO DATA ] !!!

root@influxdb3:/opt/influxdb/shared/data# find  db
db
db/shard_db_v2

[ NO DATA ] !!!

It seems like no data replication is done between nodes.

We are now working with the following configuration in node1 and the only difference in node2 and node3 is the "seed-servers = ["192.168.150.120:8090"]" in the cluster section.

How can I fix this problem?

hostname = "192.168.150.120"
bind-address = "0.0.0.0"
reporting-disabled = false

[logging]
level  = "info"
file   = "/opt/influxdb/shared/log.txt"        

[admin]
port   = 8083              
assets = "/opt/influxdb/current/admin"

[api]
port     = 8086    
[input_plugins]

  [input_plugins.graphite]
  enabled = true
  port = 2003
  database = "graphite"  

  [input_plugins.udp]
  enabled = false

  [[input_plugins.udp_servers]] 
  enabled = false

[raft]
port = 8090
dir  = "/opt/influxdb/shared/data/raft"

[storage]

dir = "/opt/influxdb/shared/data/db"
write-buffer-size = 10000
default-engine = "rocksdb"
max-open-shards = 0
point-batch-size = 100
write-batch-size = 5000000
retention-sweep-period = "10m"

[storage.engines.leveldb]

max-open-files = 1000
lru-cache-size = "200m"

[storage.engines.rocksdb]

max-open-files = 1000
lru-cache-size = "200m"

[storage.engines.hyperleveldb]
max-open-files = 1000
lru-cache-size = "200m"

[storage.engines.lmdb]

map-size = "100g"

[cluster]

protobuf_port = 8099
protobuf_timeout = "2s" 
protobuf_heartbeat = "200ms"
protobuf_min_backoff = "1s" 
protobuf_max_backoff = "10s" 

write-buffer-size = 1000
max-response-buffer-size = 100

concurrent-shard-query-limit = 10

[leveldb]

max-open-files = 40

lru-cache-size = "200m"
max-open-shards = 0
point-batch-size = 100

[sharding]

  replication-factor = 3

  [sharding.short-term]
  duration = "7d"
  split = 1

  [sharding.long-term]
  duration = "30d"
  split = 1

[wal]

dir   = "/opt/influxdb/shared/data/wal"
flush-after = 1000
bookmark-after = 1000 
requests-per-logfile = 10000
toddboom commented 10 years ago

@toni-moreno What version are you using when testing this? What you're describing sounds correct. My only advice would be to remove any old data on the nodes before bringing them online.

toni-moreno commented 10 years ago

We are running influx 0.8 rc5 We will remove old data to test again replication factor on a testing influxdb cluster.

But what you are saying is that , influxdb doesn't support replication factor changes on a previously installed an running influxdb cluster ? What happens if I have one running on production and I wish to change replication-factor? should we remove production data?

oliveagle commented 10 years ago

correct me if I'm wrong.

From 0.8, there is a shardSpace controlling sharding configuration. the [sharding] configuration section in toml file no longer works.

https://github.com/influxdb/influxdb/blob/v0.8.0-rc.5/cluster/cluster_configuration.go:862   

if u didn't create a shardSpace yourself, a default one will be created. which has a default replicationFactor of 1.

https://github.com/influxdb/influxdb/blob/v0.8.0-rc.5/cluster/shard_space.go:31

You can use postman to query or create shard_space

GET   http://localhost:8086/cluster/shard_spaces?u=root&p=root

POST http://localhost:8086/cluster/shard_spaces/database_name?u=root&p=root
row json:
    {"Name": "shard space name", 
     "Database": "database_name", 
     "retentionPolicy": "inf",
        "shardDuration": "1m",
        "regex": "/.*/",
        "replicationFactor": 3,
        "split": 1
    }
toni-moreno commented 10 years ago

Hi

@toddboom : we did the cleaning process with the same result . ( you can see config and logs from this gist) https://gist.github.com/ricardmestre/5b1d42ddb29402795024

@oliveagle : we will try ASAP but if you're right , the following install process (http://crapworks.de/blog/2014/05/18/influxdb-clustering/) is wrong , isn't it?

oliveagle commented 10 years ago

InfluxDB is a quite new timeseries database. I was having a look at it during my search for alternatives for Graphites carbon/whisper backend. Since it looks pretty promising, right now it need some effort to get it up and running, especially if you want to build up a cluster (one of the reasons I was searching for an alternative to carbon).

Using version 0.6.5 I’m going to describe what you have to do to setup a 3-Node cluster with a replication level of two. Primarily for me as a reminder, but maybe someone will find this usefull. I assume you use the Debian package provided on the influxdb website.

sharding configuration is wrong if you are trying 0.8

cboggs commented 10 years ago

I ran into some headaches with replicated shards spaces in 0.8.0. Upgrading to 0.8.2 made life much better, as #886 was fixed in that build. Might be worth a shot @oliveagle! On Sep 13, 2014 7:31 AM, "oliveagle" notifications@github.com wrote:

InfluxDB is a quite new timeseries database. I was having a look at it during my search for alternatives for Graphites carbon/whisper backend. Since it looks pretty promising, right now it need some effort to get it up and running, especially if you want to build up a cluster (one of the reasons I was searching for an alternative to carbon).

Using version 0.6.5 I’m going to describe what you have to do to setup a 3-Node cluster with a replication level of two. Primarily for me as a reminder, but maybe someone will find this usefull. I assume you use the Debian package provided on the influxdb website.

sharding configuration is wrong if you are trying 0.8

— Reply to this email directly or view it on GitHub https://github.com/influxdb/influxdb/issues/882#issuecomment-55494017.

oliveagle commented 10 years ago

great. thx @cboggs

cboggs commented 10 years ago

@oliveagle, just confirmed that on 0.8.2, the following creates a correctly-replicated shard space named "default":

curl -v -XPOST 'http://influxdb1:8086/cluster/database_configs/test?u=root&p=root' --data-binary '{
  "spaces": [
    {
      "name": "default",
      "retentionPolicy": "inf",
      "shardDuration": "7d",
      "regex": "/.*/",
      "replicationFactor": 3,
      "split": 1
    }
  ]
}'

curl -v -XPOST 'http://influxdb1:8086/db/test/users?u=root&p=root' -d '{"name": "testuser", "password": "testpw"}'

curl -v -XPOST 'http://influxdb1:8086/db/test/series?u=testuser&p=testpw' -d '[{"name": "canary", "columns": ["value"], "points": [["foo"]]}]'

It used to be that this was not effective when the shard space was named "default", pre-0.8.2.

The "warm fuzzy" factor from the above commands comes from seeing the logs for all 3 instances show isLocal: true. servers: [1 2 3] after POSTing the canary data. :-) Hope that helps.

oliveagle commented 10 years ago

thx, @cboggs .

hi @toni-moreno , I just find that I forgot mention this in my first reply:

shardSpace is an aggregation of shards.