Closed grandamp closed 7 years ago
Here are the logs. WARNING and INFO logs truncated to the last 10k lines of each file.
I wanted to follow-up, and close this issue, as we have been stable for a few months. We updated our ct-server instances using #1412, and merged #1417. Further, the following is the startup script we updated for our ct-server instances:
#!/bin/bash
CTLOGHOST="`hostname -f`"
/bin/echo "Server hostname is ${CTLOGHOST}"
ETCD_SERVERS="etcd1.internal:4001,etcd2.internal:4001,etcd3.internal:4001"
/bin/echo "ETCD Cluster is ${ETCD_SERVERS}"
/bin/echo "Deleting prior log data and logs"
/bin/rm /opt/ct-log/logs/*
/bin/rm /opt/ct-log/data/log.ldb/*
cd /usr/ctlog/opts
ulimit -c unlimited
/usr/ctlog/server/ct-server \
--port=80 \
--server=${CTLOGHOST} \
--key=ct-server-key.pem \
--trusted_cert_file=ca-roots.pem \
--log_dir=/opt/ct-log/logs \
--tree_signing_frequency_seconds=30 \
--guard_window_seconds=10 \
--leveldb_db=/opt/ct-log/data/log.ldb \
--etcd_servers=${ETCD_SERVERS} \
--etcd_delete_concurrency=100 \
--num_http_server_threads=16 \
--etcd_connection_timeout_seconds=30 \
--node_state_ttl_seconds=900 \
--master_keepalive_interval_seconds=240 \
--monitoring=prometheus \
--v=0
&
Hello,
Similar to issue #811, we are seeing a failure of the ct-server on random instances about once a week.
The following entry was in the ct-server.FATAL log:
However, the etcd cluster appears to be completely healthy:
Below is some more information regarding version(s)
etcd & ct-server instance OS & version:
Ubuntu 16.04 LTS (AWS)
etcd version info:
ct-server version info:
Like issue #811, is the best approach to extend the etcd timeout(s)?