haiwen / seafile

High performance file syncing and sharing, with also Markdown WYSIWYG editing, Wiki, file label and other knowledge management features.
http://seafile.com/
Other
12.25k stars 1.54k forks source link

seafile-controller wants to start seaf-server in a loop and flooding `seafile.log` log file #1507

Closed rabb1t closed 8 years ago

rabb1t commented 8 years ago

Problem:

seafile-controller tries to start another copy of seaf-server in a loop flooding seafile.log log file

Env:

OS: CentOS Linux release 7.1.1503 (Core) Kernel: 3.10.0-327.3.1.el7.x86_64 Seafile: 4.4.7 pro edition

Description:

Hi, I have faced with the problem when seafile-server component unable to start within 10 seconds (hard coded CHECK_PROCESS_INTERVAL variable) which is caused by a slow network connectivity to the remote database server. This leads to improper execution of the second copy of seafile-server process which removes pid file located in pids/seaf-server.pid and tries to start in an infinite loop flooding logs/seafile.log log file.

Average database latency for single WRITE query is about 0.3-0.6 seconds.

I see that when seafile-server starts it tries to create several database tables:

# grep CREATE /tmp/strace/controller.44199
13:03:56 write(16, "\207\0\0\0\3CREATE TABLE IF NOT EXISTS "..., 139)                                                           = 139
13:03:57 write(16, "L\0\0\0\3CREATE TABLE IF NOT EXISTS "..., 80)                                                               = 80
13:03:57 write(16, "y\0\0\0\3CREATE TABLE IF NOT EXISTS "..., 125)                                                              = 125
13:03:58 write(16, "\311\0\0\0\3CREATE TABLE IF NOT EXISTS "..., 205)                                                           = 205
13:03:58 write(16, "h\0\0\0\3CREATE TABLE IF NOT EXISTS "..., 108)                                                              = 108
13:03:58 write(16, "\263\0\0\0\3CREATE TABLE IF NOT EXISTS "..., 183)                                                           = 183
13:03:59 write(16, "\334\0\0\0\3CREATE TABLE IF NOT EXISTS "..., 224)                                                           = 224
13:03:59 write(16, "\216\0\0\0\3CREATE TABLE IF NOT EXISTS "..., 146)                                                           = 146
13:03:59 write(16, "\232\0\0\0\3CREATE TABLE IF NOT EXISTS "..., 158)                                                           = 158
13:04:00 write(16, "\250\0\0\0\3CREATE TABLE IF NOT EXISTS "..., 172)                                                           = 172
13:04:00 write(16, "i\0\0\0\3CREATE TABLE IF NOT EXISTS "..., 109)                                                              = 109
13:04:01 write(16, "v\0\0\0\3CREATE TABLE IF NOT EXISTS "..., 122)                                                              = 122
13:04:01 write(16, "f\0\0\0\3CREATE TABLE IF NOT EXISTS "..., 106)                                                              = 106
13:04:01 write(16, "h\0\0\0\3CREATE TABLE IF NOT EXISTS "..., 108)                                                              = 108
13:04:02 write(16, "g\0\0\0\3CREATE TABLE IF NOT EXISTS "..., 107)                                                              = 107
13:04:03 write(16, "\235\0\0\0\3CREATE TABLE IF NOT EXISTS "..., 161)                                                           = 161
13:04:03 write(16, "U\0\0\0\3CREATE TABLE IF NOT EXISTS "..., 89)                                                               = 89
13:04:03 write(16, "]\0\0\0\3CREATE TABLE IF NOT EXISTS "..., 97)                                                               = 97
13:04:04 write(16, "\217\0\0\0\3CREATE TABLE IF NOT EXISTS "..., 147)                                                           = 147
13:04:04 write(16, "\250\0\0\0\3CREATE TABLE IF NOT EXISTS "..., 172)                                                           = 172
13:04:05 write(16, "\250\0\0\0\3CREATE TABLE IF NOT EXISTS "..., 172)                                                           = 172
13:04:05 write(16, "m\0\0\0\3CREATE TABLE IF NOT EXISTS "..., 113)                                                              = 113
13:04:05 write(16, "\342\0\0\0\3CREATE TABLE IF NOT EXISTS "..., 230)                                                           = 230
13:04:06 write(16, "p\0\0\0\3CREATE TABLE IF NOT EXISTS "..., 116)                                                              = 116
13:04:06 write(16, "\216\0\0\0\3CREATE TABLE IF NOT EXISTS "..., 146)                                                           = 146
13:04:07 write(16, "`\0\0\0\3CREATE TABLE IF NOT EXISTS "..., 100)                                                              = 100
13:04:07 write(16, "e\0\0\0\3CREATE TABLE IF NOT EXISTS "..., 105)                                                              = 105
13:04:07 write(16, "\\\0\0\0\3CREATE TABLE IF NOT EXISTS "..., 96)                                                              = 96
13:04:08 write(16, "\202\0\0\0\3CREATE TABLE IF NOT EXISTS "..., 134)                                                           = 134
13:04:09 write(16, "\263\0\0\0\3CREATE TABLE IF NOT EXISTS "..., 183)                                                           = 183
13:04:10 write(16, "i\0\0\0\3CREATE TABLE IF NOT EXISTS "..., 109)                                                              = 109
13:04:10 write(16, "\361\0\0\0\3CREATE TABLE IF NOT EXISTS "..., 245)                                                           = 245
13:04:10 write(16, "\207\0\0\0\3CREATE TABLE IF NOT EXISTS "..., 139) = 139

# grep -c CREATE controller.44199
33

So as you can see the start up time could take up to 15 seconds (from 13:03:56 to 13:04:10). But after 10 seconds (CHECK_PROCESS_INTERVAL) seafile-controller checks pids/seaf-server.pid pidfile which still doesn't exist and starts another seafile-server process which fails to bind to socket because it's already in use and removes valid pidfile of the first process which have already appeared by this time.

Here are log files cuts:

seafile.log

http-server.c(2364): Could not bind socket: Address already in use

controller.log

          [01/13/16 11:57:13] seafile-controller.c(161): starting ccnet-server ...
          [01/13/16 11:57:13] seafile-controller.c(80): spawn_process: ccnet-server -c /srv/seafile/ccnet -f /srv/seafile/logs/ccnet.log -d -M 10000000 -P /srv/
          seafile/pids/ccnet.pid         
          [01/13/16 11:57:13] seafile-controller.c(95): spawned ccnet-server, pid 125886
          [01/13/16 11:57:17] seafile-controller.c(749): ccnet daemon connected.
          [01/13/16 11:57:17] seafile-controller.c(197): starting seaf-server ...
          [01/13/16 11:57:17] seafile-controller.c(80): spawn_process: seaf-server -c /srv/seafile/ccnet -d /srv/seafile/seafile-data -l /srv/seafile/logs/seafile.log -P /srv/seafile/pids/seaf-server.pid
          [01/13/16 11:57:17] seafile-controller.c(95): spawned seaf-server, pid 125906
          [01/13/16 11:57:17] seafile-controller.c(570): pid file /srv/seafile/pids/seafevents.pid does not exist
          [01/13/16 11:57:17] seafile-controller.c(80): spawn_process: /usr/bin/python2.7 -m seafevents.main --config-file /srv/seafile/pro-data/seafevents.conf --loglevel debug --logfile /srv/seafile/logs/seafevents.log -P /srv/seafile/pids/seafevents.pid
          [01/13/16 11:57:17] seafile-controller.c(95): spawned /usr/bin/python2.7, pid 125907
          [01/13/16 11:57:17] seafile-controller.c(717): seafdav is not enabled
          [01/13/16 11:57:17] seafile-controller.c(478): search is not enabled
      |-->[01/13/16 11:57:27] seafile-controller.c(570): pid file /srv/seafile/pids/seaf-server.pid does not exist
      |   [01/13/16 11:57:27] seafile-controller.c(590): seaf-server need restart...
      |   [01/13/16 11:57:27] seafile-controller.c(197): starting seaf-server ...
      |   [01/13/16 11:57:27] seafile-controller.c(80): spawn_process: seaf-server -c /srv/seafile/ccnet -d /srv/seafile/seafile-data -l /srv/seafile/logs/seafile.log -P /srv/seafile/pids/seaf-server.pid
      |   [01/13/16 11:57:27] seafile-controller.c(95): spawned seaf-server, pid 125983
      |   [01/13/16 11:57:47] seafile-controller.c(570): pid file /srv/seafile/pids/seaf-server.pid does not exist
      |   [01/13/16 11:57:47] seafile-controller.c(590): seaf-server need restart...
      |   [01/13/16 11:57:47] seafile-controller.c(197): starting seaf-server ...
loop -|   [01/13/16 11:57:47] seafile-controller.c(80): spawn_process: seaf-server -c /srv/seafile/ccnet -d /srv/seafile/seafile-data -l /srv/seafile/logs/seafile.log -P /srv/seafile/pids/seaf-server.pid
      |   [01/13/16 11:57:47] seafile-controller.c(95): spawned seaf-server, pid 126150
      |   [01/13/16 11:57:57] seafile-controller.c(570): pid file /srv/seafile/pids/seaf-server.pid does not exist
      |   [01/13/16 11:57:57] seafile-controller.c(590): seaf-server need restart...
      |   [01/13/16 11:57:57] seafile-controller.c(197): starting seaf-server ...
      |   [01/13/16 11:57:57] seafile-controller.c(80): spawn_process: seaf-server -c /srv/seafile/ccnet -d /srv/seafile/seafile-data -l /srv/seafile/logs/seafile.log -P /srv/seafile/pids/seaf-server.pid
      |   [01/13/16 11:57:57] seafile-controller.c(95): spawned seaf-server, pid 126455
      |   [01/13/16 11:58:07] seafile-controller.c(570): pid file /srv/seafile/pids/seaf-server.pid does not exist
      |   [01/13/16 11:58:07] seafile-controller.c(590): seaf-server need restart...
      |   [01/13/16 11:58:07] seafile-controller.c(197): starting seaf-server ...
      |   [01/13/16 11:58:07] seafile-controller.c(80): spawn_process: seaf-server -c /srv/seafile/ccnet -d /srv/seafile/seafile-data -l /srv/seafile/logs/seafile.log -P /srv/seafile/pids/seaf-server.pid
      |-->[01/13/16 11:58:07] seafile-controller.c(95): spawned seaf-server, pid 126652

seafile.log

         [01/13/2016 11:57:28 AM] ../common/mq-mgr.c(60): [mq client] mq cilent is started
         [01/13/2016 11:57:28 AM] ../common/mq-mgr.c(106): [mq mgr] publish to hearbeat mq: seaf_server.heartbeat
      |->[01/13/2016 11:57:39 AM] ../common/mq-mgr.c(60): [mq client] mq cilent is started
      |  [01/13/2016 11:57:39 AM] ../common/mq-mgr.c(106): [mq mgr] publish to hearbeat mq: seaf_server.heartbeat
loop -|  [01/13/2016 11:57:39 AM] http-server.c(2364): Could not bind socket: Address already in use
      |  [01/13/2016 11:57:59 AM] ../common/mq-mgr.c(60): [mq client] mq cilent is started
      |  [01/13/2016 11:57:59 AM] ../common/mq-mgr.c(106): [mq mgr] publish to hearbeat mq: seaf_server.heartbeat
      |->[01/13/2016 11:58:00 AM] http-server.c(2364): Could not bind socket: Address already in use

audit.log (kernel auditing)

[Initial start of seafile(seafile-controller) process]

type=CONFIG_CHANGE msg=audit(1452604765.604:3333): auid=10003 ses=6580 op="updated_rules" path="/srv/seafile/pids/seaf-server.pid" key=(null) list=4 res=1
type=SYSCALL msg=audit(1452604765.604:3334): arch=c000003e syscall=2 success=yes exit=23 a0=7ffddbf551bb a1=241 a2=1b6 a3=7ffddbf54230 items=2 ppid=1 pid=40620 auid=10003 uid=10006 gid=1007 euid=10006 suid=10006 fsuid=10006 egid=1007 sgid=1007 fsgid=1007 tty=(none) ses=6580 comm="seaf-server" exe="/srv/seafile/seafile-pro-server-4.4.7/seafile/bin/seaf-server" key=(null)
type=CWD msg=audit(1452604765.604:3334):  cwd="/srv/seafile/seafile-pro-server-4.4.7"
type=PATH msg=audit(1452604765.604:3334): item=0 name="/srv/seafile/pids/" inode=1355429 dev=fd:01 mode=040755 ouid=10006 ogid=1007 rdev=00:00 objtype=PARENT
type=PATH msg=audit(1452604765.604:3334): item=1 name="/srv/seafile/pids/seaf-server.pid" inode=1119190 dev=fd:01 mode=0100664 ouid=10006 ogid=1007 rdev=00:00 objtype=CREATE

[10 seconds later seafile-controller can't find pidfile of seaf-server process and starts its second instance]

type=SYSCALL msg=audit(1452604775.455:3335): arch=c000003e syscall=2 success=yes exit=23 a0=7ffe8b17f1bb a1=241 a2=1b6 a3=7ffe8b17ea60 items=2 ppid=1 pid=40683 auid=10003 uid=10006 gid=1007 euid=10006 suid=10006 fsuid=10006 egid=1007 sgid=1007 fsgid=1007 tty=(none) ses=6580 comm="seaf-server" exe="/srv/seafile/seafile-pro-server-4.4.7/seafile/bin/seaf-server" key=(null)
type=CWD msg=audit(1452604775.455:3335):  cwd="/srv/seafile/seafile-pro-server-4.4.7"
type=PATH msg=audit(1452604775.455:3335): item=0 name="/srv/seafile/pids/" inode=1355429 dev=fd:01 mode=040755 ouid=10006 ogid=1007 rdev=00:00 objtype=PARENT
type=PATH msg=audit(1452604775.455:3335): item=1 name="/srv/seafile/pids/seaf-server.pid" inode=1119190 dev=fd:01 mode=0100664 ouid=10006 ogid=1007 rdev=00:00 objtype=NORMAL
type=CONFIG_CHANGE msg=audit(1452604775.455:3336): auid=10003 ses=6580 op="updated_rules" path="/srv/seafile/pids/seaf-server.pid" key=(null) list=4 res=1
type=SYSCALL msg=audit(1452604775.455:3337): arch=c000003e syscall=87 success=yes exit=0 a0=7ffe8b17f1bb a1=ffffffff a2=380 a3=7fc5ccd29ae0 items=2 ppid=1 pid=40831 auid=10003 uid=10006 gid=1007 euid=10006 suid=10006 fsuid=10006 egid=1007 sgid=1007 fsgid=1007 tty=(none) ses=6580 comm="seaf-server" exe="/srv/seafile/seafile-pro-server-4.4.7/seafile/bin/seaf-server" key=(null)
type=CWD msg=audit(1452604775.455:3337):  cwd="/srv/seafile/seafile-pro-server-4.4.7"
type=PATH msg=audit(1452604775.455:3337): item=0 name="/srv/seafile/pids/" inode=1355429 dev=fd:01 mode=040755 ouid=10006 ogid=1007 rdev=00:00 objtype=PARENT
type=PATH msg=audit(1452604775.455:3337): item=1 name="/srv/seafile/pids/seaf-server.pid" inode=1119190 dev=fd:01 mode=0100664 ouid=10006 ogid=1007 rdev=00:00 objtype=DELETE

[endless loop later]

type=CONFIG_CHANGE msg=audit(1452604794.828:3338): auid=10003 ses=6580 op="updated_rules" path="/srv/seafile/pids/seaf-server.pid" key=(null) list=4 res=1
type=SYSCALL msg=audit(1452604794.828:3339): arch=c000003e syscall=2 success=yes exit=23 a0=7ffccb8b81bb a1=241 a2=1b6 a3=7ffccb8b6130 items=2 ppid=1 pid=40849 auid=10003 uid=10006 gid=1007 euid=10006 suid=10006 fsuid=10006 egid=1007 sgid=1007 fsgid=1007 tty=(none) ses=6580 comm="seaf-server" exe="/srv/seafile/seafile-pro-server-4.4.7/seafile/bin/seaf-server" key=(null)
type=CWD msg=audit(1452604794.828:3339):  cwd="/srv/seafile/seafile-pro-server-4.4.7"
type=PATH msg=audit(1452604794.828:3339): item=0 name="/srv/seafile/pids/" inode=1355429 dev=fd:01 mode=040755 ouid=10006 ogid=1007 rdev=00:00 objtype=PARENT
type=PATH msg=audit(1452604794.828:3339): item=1 name="/srv/seafile/pids/seaf-server.pid" inode=1119190 dev=fd:01 mode=0100664 ouid=10006 ogid=1007 rdev=00:00 objtype=CREATE
type=CONFIG_CHANGE msg=audit(1452604794.828:3340): auid=10003 ses=6580 op="updated_rules" path="/srv/seafile/pids/seaf-server.pid" key=(null) list=4 res=1
type=SYSCALL msg=audit(1452604794.828:3341): arch=c000003e syscall=87 success=yes exit=0 a0=7ffccb8b81bb a1=ffffffff a2=380 a3=7fe001647ae0 items=2 ppid=1 pid=40967 auid=10003 uid=10006 gid=1007 euid=10006 suid=10006 fsuid=10006 egid=1007 sgid=1007 fsgid=1007 tty=(none) ses=6580 comm="seaf-server" exe="/srv/seafile/seafile-pro-server-4.4.7/seafile/bin/seaf-server" key=(null)
type=CWD msg=audit(1452604794.828:3341):  cwd="/srv/seafile/seafile-pro-server-4.4.7"
type=PATH msg=audit(1452604794.828:3341): item=0 name="/srv/seafile/pids/" inode=1355429 dev=fd:01 mode=040755 ouid=10006 ogid=1007 rdev=00:00 objtype=PARENT
type=PATH msg=audit(1452604794.828:3341): item=1 name="/srv/seafile/pids/seaf-server.pid" inode=1119190 dev=fd:01 mode=0100664 ouid=10006 ogid=1007 rdev=00:00 objtype=DELETE
freeplant commented 8 years ago

It is fixed in https://github.com/haiwen/seafile/commit/552ec3f5363c9f266316e029c242dd7504e66494. The fix will be included in the next release.