Closed toni-moreno closed 10 years ago
I'm reviewing data and we have found files only in the node 1.
root@influxdb1:/opt/influxdb/shared/data# find db
db
db/shard_db_v2
db/shard_db_v2/00001
db/shard_db_v2/00001/IDENTITY
db/shard_db_v2/00001/LOG.old.1409215976228192
db/shard_db_v2/00001/000004.sst
db/shard_db_v2/00001/LOCK
db/shard_db_v2/00001/MANIFEST-000005
db/shard_db_v2/00001/000006.log
db/shard_db_v2/00001/type
db/shard_db_v2/00001/CURRENT
db/shard_db_v2/00001/LOG
root@influxdb2:/opt/influxdb/shared/data# find db
db
db/shard_db_v2
[ NO DATA ] !!!
root@influxdb3:/opt/influxdb/shared/data# find db
db
db/shard_db_v2
[ NO DATA ] !!!
It seems like no data replication is done between nodes.
We are now working with the following configuration in node1 and the only difference in node2 and node3 is the "seed-servers = ["192.168.150.120:8090"]" in the cluster section.
How can I fix this problem?
hostname = "192.168.150.120"
bind-address = "0.0.0.0"
reporting-disabled = false
[logging]
level = "info"
file = "/opt/influxdb/shared/log.txt"
[admin]
port = 8083
assets = "/opt/influxdb/current/admin"
[api]
port = 8086
[input_plugins]
[input_plugins.graphite]
enabled = true
port = 2003
database = "graphite"
[input_plugins.udp]
enabled = false
[[input_plugins.udp_servers]]
enabled = false
[raft]
port = 8090
dir = "/opt/influxdb/shared/data/raft"
[storage]
dir = "/opt/influxdb/shared/data/db"
write-buffer-size = 10000
default-engine = "rocksdb"
max-open-shards = 0
point-batch-size = 100
write-batch-size = 5000000
retention-sweep-period = "10m"
[storage.engines.leveldb]
max-open-files = 1000
lru-cache-size = "200m"
[storage.engines.rocksdb]
max-open-files = 1000
lru-cache-size = "200m"
[storage.engines.hyperleveldb]
max-open-files = 1000
lru-cache-size = "200m"
[storage.engines.lmdb]
map-size = "100g"
[cluster]
protobuf_port = 8099
protobuf_timeout = "2s"
protobuf_heartbeat = "200ms"
protobuf_min_backoff = "1s"
protobuf_max_backoff = "10s"
write-buffer-size = 1000
max-response-buffer-size = 100
concurrent-shard-query-limit = 10
[leveldb]
max-open-files = 40
lru-cache-size = "200m"
max-open-shards = 0
point-batch-size = 100
[sharding]
replication-factor = 3
[sharding.short-term]
duration = "7d"
split = 1
[sharding.long-term]
duration = "30d"
split = 1
[wal]
dir = "/opt/influxdb/shared/data/wal"
flush-after = 1000
bookmark-after = 1000
requests-per-logfile = 10000
@toni-moreno What version are you using when testing this? What you're describing sounds correct. My only advice would be to remove any old data on the nodes before bringing them online.
We are running influx 0.8 rc5 We will remove old data to test again replication factor on a testing influxdb cluster.
But what you are saying is that , influxdb doesn't support replication factor changes on a previously installed an running influxdb cluster ? What happens if I have one running on production and I wish to change replication-factor? should we remove production data?
correct me if I'm wrong.
From 0.8, there is a shardSpace
controlling sharding configuration. the [sharding]
configuration section in toml file no longer works.
https://github.com/influxdb/influxdb/blob/v0.8.0-rc.5/cluster/cluster_configuration.go:862
if u didn't create a shardSpace
yourself, a default one will be created. which has a default replicationFactor
of 1.
https://github.com/influxdb/influxdb/blob/v0.8.0-rc.5/cluster/shard_space.go:31
You can use postman to query or create shard_space
GET http://localhost:8086/cluster/shard_spaces?u=root&p=root
POST http://localhost:8086/cluster/shard_spaces/database_name?u=root&p=root
row json:
{"Name": "shard space name",
"Database": "database_name",
"retentionPolicy": "inf",
"shardDuration": "1m",
"regex": "/.*/",
"replicationFactor": 3,
"split": 1
}
Hi
@toddboom : we did the cleaning process with the same result . ( you can see config and logs from this gist) https://gist.github.com/ricardmestre/5b1d42ddb29402795024
@oliveagle : we will try ASAP but if you're right , the following install process (http://crapworks.de/blog/2014/05/18/influxdb-clustering/) is wrong , isn't it?
InfluxDB is a quite new timeseries database. I was having a look at it during my search for alternatives for Graphites carbon/whisper backend. Since it looks pretty promising, right now it need some effort to get it up and running, especially if you want to build up a cluster (one of the reasons I was searching for an alternative to carbon).
Using version 0.6.5 I’m going to describe what you have to do to setup a 3-Node cluster with a replication level of two. Primarily for me as a reminder, but maybe someone will find this usefull. I assume you use the Debian package provided on the influxdb website.
sharding configuration is wrong if you are trying 0.8
I ran into some headaches with replicated shards spaces in 0.8.0. Upgrading to 0.8.2 made life much better, as #886 was fixed in that build. Might be worth a shot @oliveagle! On Sep 13, 2014 7:31 AM, "oliveagle" notifications@github.com wrote:
InfluxDB is a quite new timeseries database. I was having a look at it during my search for alternatives for Graphites carbon/whisper backend. Since it looks pretty promising, right now it need some effort to get it up and running, especially if you want to build up a cluster (one of the reasons I was searching for an alternative to carbon).
Using version 0.6.5 I’m going to describe what you have to do to setup a 3-Node cluster with a replication level of two. Primarily for me as a reminder, but maybe someone will find this usefull. I assume you use the Debian package provided on the influxdb website.
sharding configuration is wrong if you are trying 0.8
— Reply to this email directly or view it on GitHub https://github.com/influxdb/influxdb/issues/882#issuecomment-55494017.
great. thx @cboggs
@oliveagle, just confirmed that on 0.8.2, the following creates a correctly-replicated shard space named "default":
curl -v -XPOST 'http://influxdb1:8086/cluster/database_configs/test?u=root&p=root' --data-binary '{
"spaces": [
{
"name": "default",
"retentionPolicy": "inf",
"shardDuration": "7d",
"regex": "/.*/",
"replicationFactor": 3,
"split": 1
}
]
}'
curl -v -XPOST 'http://influxdb1:8086/db/test/users?u=root&p=root' -d '{"name": "testuser", "password": "testpw"}'
curl -v -XPOST 'http://influxdb1:8086/db/test/series?u=testuser&p=testpw' -d '[{"name": "canary", "columns": ["value"], "points": [["foo"]]}]'
It used to be that this was not effective when the shard space was named "default", pre-0.8.2.
The "warm fuzzy" factor from the above commands comes from seeing the logs for all 3 instances show isLocal: true. servers: [1 2 3]
after POSTing the canary data. :-) Hope that helps.
thx, @cboggs .
hi @toni-moreno , I just find that I forgot mention this in my first reply:
shardSpace
is an aggregation of shards
.
We have been testing a Influxdb 3 node cluster with replication-factor=3 and node1 as seed as follows:
http://crapworks.de/blog/2014/05/18/influxdb-clustering/
It seems to work fine when all nodes are running and we can query data contained on any node even when the node2 or node3 are stopped/failing. I suppose this behavior is because of replication-factor.
But when node 1 ( the seed) is stopped or failing we can not query any data on the two other nodes. with the following error:
How can we configure a full HA infrastructure with influxdb ?