mesosphere-backup / etcd-mesos

self-healing etcd on mesos!
Apache License 2.0
67 stars 19 forks source link

determine healthy disk size #30

Closed spacejam closed 9 years ago

spacejam commented 9 years ago

during load testing, some etcd nodes crashed during log compaction with the following error:

2015/08/17 17:24:41 etcdserver: raft save state and entries error: write etcd_data/member/wal/000000000000001f-00000000007582d2.wal: no space left on device

As can be seen at https://github.com/coreos/etcd/issues/3300, etcd can use double the required disk space to complete a compaction. We need to determine a safe size for most workloads to safely exist in the mesos sandbox.

jdef commented 9 years ago

i've seen my mesos cluster crash in fun ways too (and, not surprisingly, docker). the joys of storage.. it's a problem for everyone.

i'm not sure what the state of disk reservation is in core, but mesos-go just upgraded to the 0.23 protos (warning: has probably not seen very much testing).

On Mon, Aug 17, 2015 at 8:35 PM, Tyler Neely notifications@github.com wrote:

during load testing, some etcd nodes crashed during log compaction with the following error:

2015/08/17 17:24:41 etcdserver: raft save state and entries error: write etcd_data/member/wal/000000000000001f-00000000007582d2.wal: no space left on device

As can be seen at coreos/etcd#3300 https://github.com/coreos/etcd/issues/3300, etcd can use double the required disk space to complete a compaction. We need to determine a safe size for most workloads to safely exist in the mesos sandbox.

— Reply to this email directly or view it on GitHub https://github.com/mesosphere/etcd-mesos/issues/30.

spacejam commented 9 years ago

this is now fully configurable https://github.com/mesosphere/etcd-mesos/pull/33