Closed ezcocos closed 9 years ago
@ezcocos What version of InfluxDB are you running? The admin interface is now compiled directly into the binary, so this issue should never come up again.
Many thanks for your reply. I have the following on my system: /opt/influxdb/versions/0.8.5
@ezcocos It sounds like you're saying that this only happens after you're writing data for a while. Can you tell me a little bit more about the data that you're writing in (i.e. size, duration, frequency)?
It is a one off load of 10 years of daily data of say 8 variables.
@ezcocos @toddboom I also have this issue.
I tried to insert some data every 1 second. Also at the same time I run the Collectd daemon to push data via influxdb-collectd-proxy.
I'm running InfluxDB in a docker container and mount the data directory to host like this:
docker run -v /mnt/influxdb_data:/influxdb_data:rw -d -p 25826:25826/udp -p 8096:8096/udp -p 8086:8086 -p 8083:8083 -p 8081:80 -p 8125:8125/udp -p 8126:8126 grafana
When I have a new docker and if I run docker kill
, docker rm
to stop and remove old docker container, and start a new docker. Then even influxdb binary is running, I won't be able to access any of HTTP API. Either 8083 or 8086.
I had a feelig that data is crash because if I just clear out everything in /mnt/influxdb_data and start docker again then it worked.
I tried to add -repaid-ldp=true but seems doesn't help
My log file show this:
[2014/12/02 19:30:46 UTC] [INFO] (github.com/influxdb/influxdb/coordinator.(*RaftServer).dropShardsWithRetentionPolicies:537) Checking for shards to drop
[2014/12/02 19:32:32 UTC] [INFO] (main.waitForSignals:24) Received signal: terminated
[2014/12/02 19:32:32 UTC] [INFO] (github.com/influxdb/influxdb/server.(*Server).Stop:263) Stopping server
[2014/12/02 19:32:32 UTC] [INFO] (github.com/influxdb/influxdb/server.(*Server).Stop:272) Stopping admin server
[2014/12/02 19:32:32 UTC] [INFO] (github.com/influxdb/influxdb/server.(*Server).Stop:274) admin server stopped
[2014/12/02 19:32:32 UTC] [INFO] (github.com/influxdb/influxdb/server.(*Server).Stop:276) Stopping raft server
[2014/12/02 19:32:44 UTC] [INFO] (main.setupLogging:69) Redirectoring logging to /var/log/influxdb_log.txt
[2014/12/02 19:32:44 UTC] [INFO] (main.start:164) Starting Influx Server 0.8.6 bound to 0.0.0.0...
[2014/12/02 19:32:44 UTC] [INFO] (github.com/influxdb/influxdb/server.NewServer:43) Opening database at /influxdb_data/db
[2014/12/02 19:32:44 UTC] [INFO] (github.com/influxdb/influxdb/wal.NewWAL:40) Opening wal in /influxdb_data/wal
[2014/12/02 19:32:44 UTC] [INFO] (github.com/influxdb/influxdb/api/http.(*HttpServer).EnableSsl:74) Ssl will be disabled since the ssl port or certificate path weren't set
[2014/12/02 19:32:44 UTC] [INFO] (github.com/influxdb/influxdb/coordinator.(*RaftServer).Serve:566) Initializing Raft HTTP server
[2014/12/02 19:32:44 UTC] [INFO] (github.com/influxdb/influxdb/coordinator.(*RaftServer).Serve:576) Raft Server Listening at 0.0.0.0:8090
[2014/12/02 19:32:44 UTC] [INFO] (github.com/influxdb/influxdb/coordinator.(*RaftServer).startRaft:384) Initializing Raft Server: http://f4fe021fe272:8090
[2014/12/02 19:32:44 UTC] [INFO] (github.com/influxdb/influxdb/coordinator.(*InfluxJoinCommand).Apply:252) Adding new server to the cluster config a0bee89cd150455c
[2014/12/02 19:32:44 UTC] [INFO] (github.com/influxdb/influxdb/cluster.(*ClusterConfiguration).AddPotentialServer:291) Added server to cluster config: 1, http://d98cd1d00f41:8090, d98cd1d00f41:8099
[2014/12/02 19:32:44 UTC] [INFO] (github.com/influxdb/influxdb/cluster.(*ClusterConfiguration).AddPotentialServer:292) Checking whether this is the local server local: f4fe021fe272:8099, new: d98cd1d00f41:8099
[2014/12/02 19:32:44 UTC] [INFO] (github.com/influxdb/influxdb/cluster.(*ClusterConfiguration).AddPotentialServer:301) Added the local server
[2014/12/02 19:32:44 UTC] [INFO] (github.com/influxdb/influxdb/coordinator.(*RaftServer).startRaft:409) Recovered from log
[2014/12/02 19:32:44 UTC] [INFO] (github.com/influxdb/influxdb/server.(*Server).ListenAndServe:96) Waiting for local server to be added
[2014/12/02 19:32:44 UTC] [INFO] (github.com/influxdb/influxdb/wal.(*WAL).SetServerId:109) Setting server id to 1 and recovering
[2014/12/02 19:32:44 UTC] [DEBG] (github.com/influxdb/influxdb/wal.(*WAL).recover:503) Finished wal recovery
[2014/12/02 19:32:46 UTC] [INFO] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftEventHandler:448) (raft:a0bee89cd150455c) Selected as leader. Starting leader loop.
[2014/12/02 19:32:46 UTC] [INFO] (github.com/influxdb/influxdb/datastore.(*ShardDatastore).GetOrCreateShard:158) DATASTORE: opening or creating shard /influxdb_data/db/shard_db_v2/00001
[2014/12/02 19:32:46 UTC] [INFO] (github.com/influxdb/influxdb/cluster.(*ClusterConfiguration).AddShards:1090) Adding shard to default: 1 - start: Thu Nov 27 00:00:00 +0000 UTC 2014 (1417046400). end: Thu Dec 4 00:00:00 +0000 UTC 2014 (1417651200). isLocal: true. servers: [1]
[2014/12/02 19:32:47 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:32:48 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:32:49 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:32:49 UTC] [INFO] (github.com/influxdb/influxdb/server.(*Server).ListenAndServe:112) Sending change connection string command (d98cd1d00f41:8099,f4fe021fe272:8099) (http://d98cd1d00f41:8090,http://f4fe021fe272:8090)
[2014/12/02 19:32:49 UTC] [INFO] (github.com/influxdb/influxdb/datastore.(*ShardDatastore).GetOrCreateShard:158) DATASTORE: opening or creating shard /influxdb_data/db/shard_db_v2/00002
[2014/12/02 19:32:49 UTC] [INFO] (github.com/influxdb/influxdb/cluster.(*ClusterConfiguration).AddShards:1090) Adding shard to default: 2 - start: Thu Sep 6 00:00:00 +0000 UTC 2001 (999734400). end: Thu Sep 13 00:00:00 +0000 UTC 2001 (1000339200). isLocal: true. servers: [1]
[2014/12/02 19:32:50 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:32:51 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:32:52 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:32:53 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:32:54 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:32:55 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:32:56 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:32:57 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:32:58 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:32:59 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:33:00 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:33:01 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:33:02 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:33:03 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:33:04 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:33:05 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:33:06 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:33:07 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:33:08 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:33:09 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:33:10 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:33:11 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:33:12 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:33:13 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:33:14 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:33:15 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:33:16 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:33:17 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:33:18 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:33:19 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:33:20 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:33:21 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:33:22 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:33:23 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:33:24 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:33:25 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:33:26 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:33:27 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:33:28 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:33:29 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:33:30 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:33:31 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:33:32 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:33:33 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:33:34 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:33:35 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:33:36 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:33:37 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:33:38 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:33:39 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:33:40 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:33:41 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:33:42 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:33:43 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:33:44 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:33:44 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).CompactLog:350) Testing if we should compact the raft logs
[2014/12/02 19:33:45 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:33:46 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:33:47 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:33:48 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:33:49 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:33:50 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:33:51 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:33:52 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:33:53 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
[2014/12/02 19:33:54 UTC] [DEBG] (github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop:467) (raft:a0bee89cd150455c) Executing leader loop.
Not sure if this issue relevant to me https://groups.google.com/forum/#!msg/influxdb/5vQwhnXrU-E/78X7LR2UrqUJ I'm on 0.8.6
@ezcocos @kureikain It sounds like this is related to some other issues where the daemon becomes unresponsive after data is written for a while. Is the API also unresponsive at this point? Could you try changing the default datastore from "rocksdb" to "leveldb" in the configuration file and try reloading your data? Would you be able to share access to the system that this InfluxDB instance is running on?
@toddboom Thanks for quick response.
Is the API also unresponsive at this point? Yes, it's unresponsive. Any HTTP access is unreponsive.
Let me try to switch to leveldb. Yes, I can share access. Can we use teamviewer? Or I can open port on my modem to share you access via SSH? Or in worse case, I can export the whole VirualBox image.
Good new. It's more stable than with leveldb backend now. I'm still testing more and will let you know if leveldb storage fixed this issue.
I think this issue is happen when data is crash or whatever. Because it happens only when I run InfluxDB under heavy write inside a docker container inside a VirtualBox VM. When I tried on EC2, everything is fine. Even if I send kill -9 to influxdb then restart with -repair-ldb, everything is working fine.
@kureikain Are you finding that the system is still more stable with LevelDB?
@toddboom The system is more stable with LevelDB but I still experience the same issue.
I finally produced 2 data set for you to re-produce this issue.
Here is what I did:
Note that the whole of my data is just a couple MB, not big at all.
Here is the script I run to generate data point.
start=Time.now
require 'statsd-instrument'
require 'logger'
StatsD.backend = StatsD::Instrument::Backends::UDPBackend.new("192.168.59.103:8125", :statsd)
[
Thread.new { loop { StatsD.gauge('dev.vinh.test.freemem', `free -m|grep Mem | awk '{print $2}'`.to_i); StatsD.gauge('dev.vinh.test.heartbeat', Time.now.to_i);sleep 30; } },
Thread.new { loop { Random.new.rand(1000..3000).times { StatsD.increment('dev.vinh.test.run', 19) }; sleep 15 } }
].each do |t|
t.run
t.join
end
I'm running this on a EC2 c3.xlarge.
@kureikain Thanks for sending all of this over - I'm going to be testing with it this afternoon.
@toddboom Thanks. This bug is hitting us and I have to run a cronjob to backup data every 15 minutes ;(
@toddboom Hey, Todd. If anything I can help to debug this, let me know and I will be doing it. I know a bit of Go so I will be very willing to work/use/build any development code to solve this...
Facing the same issue today, one of the instances in my 3 nodes cluster has influxdb running but not responding over API. Nothing special in the log comes up. Deleting the data folder seem to fix the issue, so I'm guessing something's corrupted.
InfluxDB 0.8.7 on SSDs
It looks like I have the same issue. I have a data directory that seem to be corrupted, if I delete it, it works again. Anyone interested in taking a look at it? I only use InfluxDB to store Grafana dashboard, and it looks like it's not even up to that task...
Do anyone has a workaround ? I can't think this is happening for everyone, there has to be something that triggers it that is specific to our installations
@ybizeul It sounds like maybe the daemon isn't fully starting up. Are you able to successfully write data or execute queries when it's in this state?
We are hitting same issue with older InfluxDB version (0.8.0) and it looks like it is the same in 0.8.8.
If I delete data/raft
folder and restart InfluxDB daemon it starts up with 8083 & 8086 ports open.
It looks like raft isn't tolerating the changes of hostnames (which happens when you kill/start new containers). I noticed the following message @kureikain's logs and we can find similar thing in our logs as well:
[2014/12/02 19:32:49 UTC] [INFO] (github.com/influxdb/influxdb/server.(*Server).ListenAndServe:112) Sending change connection string command (d98cd1d00f41:8099,f4fe021fe272:8099) (http://d98cd1d00f41:8090,http://f4fe021fe272:8090)
d98cd1d00f41
is the current container, while f4fe021fe272
is the old one (which is no longer exists). Any chance this is due to Raft's election process and the fact that raft cannot work with 2 servers (it needs 3 or more)? So we are hitting some kind of infinite timeout waiting for the second server to come alive while startup?
Just FYI, the issue with hostnames is common problem when moving applications into container, I think ElasticSearch have similar issues unless you make container hostname static.
P.S I'm not sure if my findings are related to @ezcocos issues but if not it's probably should be separate issue.
@m1keil that's an interesting theory. It seems to be random though, sometimes stopping a container and restarting it works, other times it does not. Once it stops working, it's done, the only way I can recover from it is to delete the data.
Our container uses supervisord to manage the influxdb process + other processes. supervisord is PID 1 and we stop the container by using "docker stop" which sends a SIGTERM to supervisord which in turn sends a SIGTERM to the processes it manages. Docker will wait by default for 10 sec. before sending a SIGKILL if the container didn't stop as a result of SIGTERM. We tried increasing the time between SIGTERM and SIGKILL thinking that maybe influx didn't get enough time to shut down gracefully, but that still does not help.
There's definitely something "random" going on. It's very easy to reproduce this with the following Dockerfile:
FROM phusion/baseimage:0.9.15
RUN curl -O http://s3.amazonaws.com/influxdb/influxdb_0.8.8_amd64.deb && \
dpkg -i influxdb_0.8.8_amd64.deb && \
rm -rf influxdb_0.8.8_amd64.deb
RUN mkdir -p /etc/service/influxdb && \
echo '#!/bin/sh' > /etc/service/influxdb/run && \
echo 'exec /usr/bin/influxdb -config=/opt/influxdb/shared/config.toml' >> /etc/service/influxdb/run && \
chmod +x /etc/service/influxdb/run
This will build small container image with InfluxDB and runit as the init system. Runit will monitor Influx's state and auto start it as soon as it is down.
To build it: docker build -t influxdb <path to Dockerfile>
docker run -t -d --name="influx" -v <local dir>:/opt/influxdb/shared/data influxdb /sbin/my_init
This should start container and populate all Influx data in <local dir>
.
Now kill the container and start another one again:
$ docker rm -f influx
$ docker run -t -d --name="influx" -v /home/vagrant/influxdata:/opt/influxdb/shared/data influxdb /sbin/my_init
Execute bash process inside the container and inspect its state:
$ docker exec -it influx bash
root@901f13fc06b9:/# ps -ef | grep influx
root 106 102 0 17:38 ? 00:00:00 runsv influxdb
root 107 106 0 17:38 ? 00:00:00 /usr/bin/influxdb -config=/opt/influxdb/shared/config.toml
root 136 120 0 17:39 ? 00:00:00 grep --color=auto influx
root@901f13fc06b9:/# ss -tln
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 128 *:22 *:*
LISTEN 0 128 :::8099 :::*
LISTEN 0 128 :::8083 :::*
LISTEN 0 128 :::8086 :::*
LISTEN 0 128 :::22 :::*
LISTEN 0 128
Now send SIGTERM to influxdb process:
root@901f13fc06b9:/# killall -15 influxdb
.... wait 30 seconds ....
root@901f13fc06b9:/# ss -tln
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 128 *:22 *:*
LISTEN 0 128 :::22 :::*
LISTEN 0 128 :::8090 :::*
Sometimes it will recover after first SIGTERM, but additional SIGTERM will do the job. That's the random part. It seems like the only way to recover is to delete raft's folder:
root@901f13fc06b9:/# rm -rf /opt/influxdb/shared/data/raft
root@901f13fc06b9:/# killall -15 influxdb
... wait few seconds ..
root@901f13fc06b9:/# ss -tln
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 128 *:22 *:*
LISTEN 0 128 :::8099 :::*
LISTEN 0 128 :::8083 :::*
LISTEN 0 128 :::8086 :::*
LISTEN 0 128 :::22 :::*
LISTEN 0 128 :::8090 :::*
Remove previous running container and delete Influx's data files from <local dir>
.
Now repeat the same steps as in Case 1, only now launch container instances with static hostname:
$ docker run -t -d --name="influx" --hostname="influxdb" -v /home/vagrant/influxdata:/opt/influxdb/shared/data influxdb /sbin/my_init
You'll see that no matter how much SIGTERM you send in its way, it will successfully recover after few seconds.
using fig and/or docker-compose I was running into this same issue, I thought it was a data volume issue initially, but setting the hostname as mentioned fixed the problem.
This will be fixed in 0.9.0. We won't be making any new releases in the 0.8 line. In the meantime, set the hostname in your config file and it should work across restarts.
If helps someone running docker, we have been able to recover influxdb from this issue by running docker with same original hostname.
In fact, this has been the only way to recover maintaining the old data.
You can extract the original docker hostname (container id) from raft/log binary log on your data dir:
Binary editing the file, you can get something like:
...
{"name":"a4ba0aaf1b324944","connectionString":"http://03fba15f761d:8090","protobufConnectionString":"03fba15f761d:8099"}
...
So the id in *connectionString" is your old docker id.
Recover with:
docker run --hostname=03fba15f761d ...
@gunnaraasen to note re: Docker
@bercab Thanks, your comment helped
This issue was closed because it was planned to be fixed in 0.9.0 according to @pauldix, but the issue is still there in 0.9.4.2 and I can't find anything related in the changelog. Will this issue be reopened?
@AurelienLourot InfluxDB 0.9 is no longer receiving code updates. There will be no fixes to the 0.9.4.2 code base. I would recommend upgrading to InfluxDB 0.9.6 to see if that helps.
Also, please note that all other reports in this issue are for InfluxDB 0.8.x, so it is likely that while your symptoms appear similar it's not actually the same underlying cause. I encourage you to email the mailing list at influxdb@googlegroups.com for assistance.
Dear All, I have already raised this issue in another thread which got closed without me having any answer to it. I've found it particularly painful to work with influxdb, and I am now on the brink of switching definitively to another time series db, although I would have like to give influxdb a fair chance.
The problem is that after many inserts trials, influxdb is just getting blocked with no possible access: 1) I cannot connect to the database via python: File "/home/ezcocos/dev/python/pyenv2_7/local/lib/python2.7/site-packages/requests/adapters.py", line 407, in send raise ConnectionError(err, request=request) requests.exceptions.ConnectionError: ('Connection aborted.', error(111, 'Connection refused'))
2) I cannot access either the admin tools via localhost:8083 -> page not available
My only solution so far was to reinstall influxdb from scratch, what I will not be able to do in production. My question is, is there a way at least to repair influxdb in order to be able to access it again without completely reinstalling it loosing all the data?
I run influxdb on ubuntu 14.10.
Many thanks in advance for your help. ezcocos