open-io / oio-sds

High Performance Software-Defined Object Storage for Big Data and AI, that supports Amazon S3 and Openstack Swift
https://www.openio.io
Other
664 stars 93 forks source link

python: Don't depend on /dev/log #834

Open racciari opened 7 years ago

racciari commented 7 years ago

When the /dev/log socket is not available, some services (conscienceagent/ecd/rawx at least) wont start and none outputs clearly the problem. You end up with "Connection refused" in certain cases like:

root@vcfr1qx01u:~# openio cluster list --oio-ns VCFR1
[Errno 111] Connection refused
root@vcfr1qx01u:~# oio-cluster VCFR1

NAMESPACE INFORMATION

Name : VCFR1
Chunk size : 10485760 bytes
Option : automatic_open = true
Option : events-max-pending = 1000
Option : lb.rawx = WRAND
Option : lb.rdir = WRAND?shorten_ratio=1.0&standard_deviation=no
Option : meta1.events-max-pending = 1000
Option : meta2.events-max-pending = 1000
Option : meta2_check.put.DISTANCE = false
Option : meta2_check.put.GAPS = false
Option : meta2_check.put.SRVINFO = false
Option : meta2_check.put.STGCLASS = false
Option : meta2_max_versions = 1
Option : ns_status = STANDALONE
Option : service_update_policy = meta2=KEEP|3|1|;rdir=KEEP|1|1|user_is_a_service=rawx;
Option : storage_policy = THREECOPIES
Option : WORM = false
Storage Policy : ECD6_3 = NONE:ECD6_3
Storage Policy : ERASURECODE = NONE:ERASURECODE
Storage Policy : SINGLE = NONE:NONE
Storage Policy : THREECOPIES = NONE:DUPONETHREE
Storage Policy : TWOCOPIES = NONE:DUPONETWO
Data Security : DUPONETHREE = plain/distance=1,nb_copy=3
Data Security : DUPONETWO = plain/distance=1,nb_copy=2
Data Security : ECD6_3 = ec/k=6,m=3,algo=isa_l_rs_vand,distance=1
Data Security : ERASURECODE = ec/k=6,m=3,algo=liberasurecode_rs_vand,distance=1
LB(srv) : meta2=KEEP|3|1;rdir=KEEP|1|1|user_is_a_service=rawx
: rdir -> KEEP|1|1
: account -> KEEP|1|1
: rawx -> KEEP|1|1
: redis -> KEEP|1|1
: meta2 -> KEEP|3|1
: meta1 -> KEEP|1|1
: meta0 -> KEEP|1|1
: sqlx -> KEEP|1|1
: oiofs -> KEEP|1|1

You have to strace processes to find out the problem.

jfsmig commented 7 years ago

In which circumstances did you produced the behavior (distro, rsyslog/syslog-ng/journald, etc)? I found no way to reproduce, trying to kill -STOP journald is detected and unlocked by systemd, trying to stop it manually is not worth (restarted immediately).