mar10 / wsgidav

A generic and extendable WebDAV server based on WSGI
https://wsgidav.readthedocs.io
MIT License
981 stars 149 forks source link

wsgidav process not stopping #327

Open laur89 opened 1 month ago

laur89 commented 1 month ago

Describe the bug Note this is quite possibly not an issue with wsgidav itself, but seafdav - seafile project's webdav implementation that relies on wsgidav.

There are cases where upon shutting down the service wsgidav child processes still hang around, causing subsequent restart of seafile to fail. It seems to happen only if webdav server has actually been used prior to stopping. If service is merely started and immediately stopped, all processes appear to shut down OK.

Looking at seafile codebase, it appears the wsgidav process is started like this:

        char *argv[] = {
            (char *)get_python_executable(),
            "-m", "wsgidav.server.server_cli",
            "--server", "gunicorn",
            "--root", "/",
            "--log-file", seafdav_log_file, 
            "--pid", ctl->pidfile[PID_SEAFDAV],
            "--port", port,
            "--host", conf.host,
            NULL
        };
        pid = spawn_process (argv, true);

...and stopped like this:

    kill_by_force(PID_SEAFDAV);

/-/

static void
kill_by_force (int which)
{
    if (which < 0 || which >= N_PID)
        return;

    char *pidfile = ctl->pidfile[which];
    int pid = read_pid_from_pidfile(pidfile);
    if (pid > 0) {
        // if SIGKILL send success, then remove related pid file
        if (kill ((pid_t)pid, SIGKILL) == 0) {
            g_unlink (pidfile);
        }
    }
}

Note they're sending SIGKILL, so not quite sure why any process would remain hanging at all. Although unsure why SIGKILL is sent as the default signal in the first place.

To Reproduce

  1. Start seafile (that also spawns the wsgidav process)
  2. Use the webdav server (e.g. sync some files via a client)
  3. Stop seafile services (via packaged shell-script: $ seafile.sh stop
  4. Note some wsgidav processes remain

Expected behavior All processes spawned by seafile, including wsgidav ones, should be shut down.

Environment:

WsgiDAV/4.3.0 Python/3.10.12 Linux-6.1.106-Unraid-x86_64-with-glibc2.35

Additional context/longer repro example

After starting seafile, this can be seen in seafile-controller (that's spawning wsgidav process) log:

2024-10-07 01:07:01 seafile-controller.c(427): pid file /seafile/pids/seafdav.pid does not exist
2024-10-07 01:07:01 seafile-controller.c(506): seafdav need restart...
2024-10-07 01:07:01 seafile-controller.c(82): spawn_process: /usr/bin/python3 -m wsgidav.server.server_cli --server gunicorn --root / --log-file /seafile/logs/seafdav.log --pid /seafile/pids/seafdav.pid --port 8080 --host 0.0.0.0
2024-10-07 01:07:01 seafile-controller.c(116): spawned /usr/bin/python3, pid 159

These are the spawned wsgidav processes as seen from the running container (note pid 159 is tracked by seafdav as service pid):

$ ps -ef | grep wsgidav.server.server_cli
root       159    64  0 01:07 ?        00:00:01 /usr/bin/python3 -m wsgidav.server.server_cli --server gunicorn --root / --log-file /seafile/logs/seafdav.log --pid /seafile/pids/seafdav.pid --port 8080 --host 0.0.0.0
root       161   159  0 01:07 ?        00:00:00 /usr/bin/python3 -m wsgidav.server.server_cli --server gunicorn --root / --log-file /seafile/logs/seafdav.log --pid /seafile/pids/seafdav.pid --port 8080 --host 0.0.0.0
root       162   159  0 01:07 ?        00:00:00 /usr/bin/python3 -m wsgidav.server.server_cli --server gunicorn --root / --log-file /seafile/logs/seafdav.log --pid /seafile/pids/seafdav.pid --port 8080 --host 0.0.0.0
root       163   159  0 01:07 ?        00:00:00 /usr/bin/python3 -m wsgidav.server.server_cli --server gunicorn --root / --log-file /seafile/logs/seafdav.log --pid /seafile/pids/seafdav.pid --port 8080 --host 0.0.0.0
root       164   159  0 01:07 ?        00:00:00 /usr/bin/python3 -m wsgidav.server.server_cli --server gunicorn --root / --log-file /seafile/logs/seafdav.log --pid /seafile/pids/seafdav.pid --port 8080 --host 0.0.0.0
root       165   159  0 01:07 ?        00:00:00 /usr/bin/python3 -m wsgidav.server.server_cli --server gunicorn --root / --log-file /seafile/logs/seafdav.log --pid /seafile/pids/seafdav.pid --port 8080 --host 0.0.0.0

Now webdav server was used by an Android client, some I/O was performed.

Stopping the seafile server is done via a shell-script. From what is relevant, it performs two steps:

  1. first sends SIGTERM to seafile-controller process...:
    pkill -SIGTERM -f "seafile-controller -c ${default_ccnet_conf_dir}"

This signal is caught by the signal handler, which in turn sends SIGKILL to wsgi process (in this case, that'd be to PID 159)

  1. ...then itself sends SIGTERM to wsgidav process:
    pkill -f  "wsgidav.server.server_cli"

Excerpt from relevant location of said shell-script (sry, cannot find the seafile repo that contains this script:

function stop_seafile_server () {                                                                                                                    
    echo "Stopping seafile server ..."                                               
    pkill -SIGTERM -f "seafile-controller -c ${default_ccnet_conf_dir}"         # !!!   1st step
    kill_all                                                                         

    return 0                                                                         
} 

function kill_all () {                                                                                                                                                                                                                                                                                                   
    pkill -f "seaf-server -c ${default_ccnet_conf_dir}"                              
    pkill -f "fileserver -c ${default_ccnet_conf_dir}"                               
    pkill -f "seafevents.main"                                                       
    pkill -f  "wsgidav.server.server_cli"                              # !!!   2nd step
    pkill -f  "notification-server -c ${central_config_dir}"                         
    pkill -f  "seafile-monitor.sh"                                                   
}                                                                                    

After this following 4 processes still remain hanging about:

$ ps -ef | grep wsgidav.server.server_cli
root       161     1  2 01:07 ?        00:01:01 /usr/bin/python3 -m wsgidav.server.server_cli --server gunicorn --root / --log-file /seafile/logs/seafdav.log --pid /seafile/pids/seafdav.pid --port 8080 --host 0.0.0.0
root       162     1  0 01:07 ?        00:00:00 /usr/bin/python3 -m wsgidav.server.server_cli --server gunicorn --root / --log-file /seafile/logs/seafdav.log --pid /seafile/pids/seafdav.pid --port 8080 --host 0.0.0.0
root       164     1  0 01:07 ?        00:00:00 /usr/bin/python3 -m wsgidav.server.server_cli --server gunicorn --root / --log-file /seafile/logs/seafdav.log --pid /seafile/pids/seafdav.pid --port 8080 --host 0.0.0.0
root       165     1  0 01:07 ?        00:00:00 /usr/bin/python3 -m wsgidav.server.server_cli --server gunicorn --root / --log-file /seafile/logs/seafdav.log --pid /seafile/pids/seafdav.pid --port 8080 --host 0.0.0.0

I suppose my question is whether this is expected and is the wsgidav service shutdown performed correctly by seafile? Trying to kill the processes via another SIGTERM (i.e. default signal sent by pkill) does nothing, yet sending SIGKILL or SIGHUP appears to get rid of 'em:

$ pkill --signal SIGHUP -f  'wsgidav.server.server_cli'

No idea what's up with that or whether it's safe to do so. Grepped wsgidav codebase and cannot find any signal handlers whatsoever, so no idea why SIGHUP works.

My guess would be the issue is that the SIGKILL sent by the controller is targeted at the parent process, so it doesn't have a chance to gracefully shut down the child processes. But that's just a speculation. Nope that's not it. Sending SIGTERM to just the parent process only causes one of the child (!) processes to be nuked:

# prior to kill:
$ ps -ef | grep wsgidav.server.server_cli
root      5476  5400  0 11:58 ?        00:00:00 /usr/bin/python3 -m wsgidav.server.server_cli --server gunicorn --root / --log-file /seafile/logs/seafdav.log --pid /seafile/pids/seafdav.pid --port 8080 --host 0.0.0.0
root      5478  5476  0 11:58 ?        00:00:00 /usr/bin/python3 -m wsgidav.server.server_cli --server gunicorn --root / --log-file /seafile/logs/seafdav.log --pid /seafile/pids/seafdav.pid --port 8080 --host 0.0.0.0
root      5479  5476  0 11:58 ?        00:00:00 /usr/bin/python3 -m wsgidav.server.server_cli --server gunicorn --root / --log-file /seafile/logs/seafdav.log --pid /seafile/pids/seafdav.pid --port 8080 --host 0.0.0.0
root      5480  5476  0 11:58 ?        00:00:00 /usr/bin/python3 -m wsgidav.server.server_cli --server gunicorn --root / --log-file /seafile/logs/seafdav.log --pid /seafile/pids/seafdav.pid --port 8080 --host 0.0.0.0
root      5481  5476  0 11:58 ?        00:00:00 /usr/bin/python3 -m wsgidav.server.server_cli --server gunicorn --root / --log-file /seafile/logs/seafdav.log --pid /seafile/pids/seafdav.pid --port 8080 --host 0.0.0.0
root      5482  5476  0 11:58 ?        00:00:00 /usr/bin/python3 -m wsgidav.server.server_cli --server gunicorn --root / --log-file /seafile/logs/seafdav.log --pid /seafile/pids/seafdav.pid --port 8080 --host 0.0.0.0

$ kill 5476

$ ps -ef | grep wsgidav.server.server_cli
root      5476  5400  0 11:58 ?        00:00:00 /usr/bin/python3 -m wsgidav.server.server_cli --server gunicorn --root / --log-file /seafile/logs/seafdav.log --pid /seafile/pids/seafdav.pid --port 8080 --host 0.0.0.0
root      5478  5476  0 11:58 ?        00:00:00 /usr/bin/python3 -m wsgidav.server.server_cli --server gunicorn --root / --log-file /seafile/logs/seafdav.log --pid /seafile/pids/seafdav.pid --port 8080 --host 0.0.0.0
root      5479  5476  0 11:58 ?        00:00:00 /usr/bin/python3 -m wsgidav.server.server_cli --server gunicorn --root / --log-file /seafile/logs/seafdav.log --pid /seafile/pids/seafdav.pid --port 8080 --host 0.0.0.0
root      5480  5476  0 11:58 ?        00:00:00 /usr/bin/python3 -m wsgidav.server.server_cli --server gunicorn --root / --log-file /seafile/logs/seafdav.log --pid /seafile/pids/seafdav.pid --port 8080 --host 0.0.0.0
root      5482  5476  0 11:58 ?        00:00:00 /usr/bin/python3 -m wsgidav.server.server_cli --server gunicorn --root / --log-file /seafile/logs/seafdav.log --pid /seafile/pids/seafdav.pid --port 8080 --host 0.0.0.0

Note PID 5476 (the parent process launched by controller) is still running, only 5481 got killed.

mar10 commented 1 month ago

I don't know much about Seafile, but I tried this

Install WsgiDAV

cd test_wsgidav
pipenv install wsgidav gunicorn
pipenv shell

Create a wsgidav.yaml file with the following content:

server: gunicorn
server_args:
  workers: 5

host: 0.0.0.0
port: 8080
provider_mapping:
  "/": "."

Run WsgiDAV

test_wsgidav) ➜  test_wsgidav wsgidav --auth anonymous                                
Using default configuration file: /Users/martin/prj/git/test_wsgidav/wsgidav.yaml
...
21:30:35.543 - INFO    : Running WsgiDAV/4.3.3 gunicorn/23.0.0 Python/3.12.0 ...
[2024-10-07 21:30:35 +0200] [70339] [INFO] Starting gunicorn 23.0.0
[2024-10-07 21:30:35 +0200] [70339] [INFO] Listening at: http://0.0.0.0:8080 (70339)
[2024-10-07 21:30:35 +0200] [70339] [INFO] Using worker: gthread
[2024-10-07 21:30:35 +0200] [70342] [INFO] Booting worker with pid: 70342
[2024-10-07 21:30:35 +0200] [70343] [INFO] Booting worker with pid: 70343
[2024-10-07 21:30:35 +0200] [70344] [INFO] Booting worker with pid: 70344
[2024-10-07 21:30:35 +0200] [70345] [INFO] Booting worker with pid: 70345
[2024-10-07 21:30:35 +0200] [70346] [INFO] Booting worker with pid: 70346

We can see that gunicorn starts five other processes, as configured.

Then open a second terminal and find the processes i.e. not the spawned process 51684:

➜  test_wsgidav ps -ef | grep wsgidav
  501 70339 70173   0  9:30pm ttys003    0:00.21 /Library/Frameworks/Python.framework/Versions/3.12/Resources/Python.app/Contents/MacOS/Python /Users/martin/prj/git/test_wsgidav/.venv/bin/wsgidav --auth anonymous
  501 70342 70339   0  9:30pm ttys003    0:00.12 /Library/Frameworks/Python.framework/Versions/3.12/Resources/Python.app/Contents/MacOS/Python /Users/martin/prj/git/test_wsgidav/.venv/bin/wsgidav --auth anonymous
  501 70343 70339   0  9:30pm ttys003    0:00.11 /Library/Frameworks/Python.framework/Versions/3.12/Resources/Python.app/Contents/MacOS/Python /Users/martin/prj/git/test_wsgidav/.venv/bin/wsgidav --auth anonymous
  501 70344 70339   0  9:30pm ttys003    0:00.12 /Library/Frameworks/Python.framework/Versions/3.12/Resources/Python.app/Contents/MacOS/Python /Users/martin/prj/git/test_wsgidav/.venv/bin/wsgidav --auth anonymous
  501 70345 70339   0  9:30pm ttys003    0:00.12 /Library/Frameworks/Python.framework/Versions/3.12/Resources/Python.app/Contents/MacOS/Python /Users/martin/prj/git/test_wsgidav/.venv/bin/wsgidav --auth anonymous
  501 70346 70339   0  9:30pm ttys003    0:00.13 /Library/Frameworks/Python.framework/Versions/3.12/Resources/Python.app/Contents/MacOS/Python /Users/martin/prj/git/test_wsgidav/.venv/bin/wsgidav --auth anonymous

now and stop the root process with SIGINT:

kill -s INT 70339

In the main terminal we see that the spawned processes are also stopped:

...
[2024-10-07 21:36:28 +0200] [70339] [INFO] Handling signal: int
[2024-10-07 21:36:28 +0200] [70342] [INFO] Worker exiting (pid: 70342)
[2024-10-07 21:36:28 +0200] [70343] [INFO] Worker exiting (pid: 70343)
[2024-10-07 21:36:28 +0200] [70344] [INFO] Worker exiting (pid: 70344)
[2024-10-07 21:36:28 +0200] [70345] [INFO] Worker exiting (pid: 70345)
[2024-10-07 21:36:28 +0200] [70346] [INFO] Worker exiting (pid: 70346)
[2024-10-07 21:36:28 +0200] [70339] [INFO] Shutting down: Master
➜  test_wsgidav 

So it looks like it is working as expected?

laur89 commented 1 month ago

Thanks for getting back so quick.

Looks like SIGINT works even with those hanging gunicorn processes. Note in original post I described how SIGTERM does nothing, but replacing it for SIGINT does the trick:

root@1d53611e14f4:/seafile# ps -ef | grep -v grep | grep wsgidav
root      1404     1  0 16:53 ?        00:00:00 /usr/bin/python3 -m wsgidav.server.server_cli --server gunicorn --root / --log-file /seafile/logs/seafdav.log --pid /seafile/pids/seafdav.pid --port 8080 --host 0.0.0.0
root      1405     1  0 16:53 ?        00:00:00 /usr/bin/python3 -m wsgidav.server.server_cli --server gunicorn --root / --log-file /seafile/logs/seafdav.log --pid /seafile/pids/seafdav.pid --port 8080 --host 0.0.0.0
root      1406     1  0 16:53 ?        00:00:00 /usr/bin/python3 -m wsgidav.server.server_cli --server gunicorn --root / --log-file /seafile/logs/seafdav.log --pid /seafile/pids/seafdav.pid --port 8080 --host 0.0.0.0
root      1407     1  0 16:53 ?        00:00:00 /usr/bin/python3 -m wsgidav.server.server_cli --server gunicorn --root / --log-file /seafile/logs/seafdav.log --pid /seafile/pids/seafdav.pid --port 8080 --host 0.0.0.0
root      1408     1  0 16:53 ?        00:00:00 /usr/bin/python3 -m wsgidav.server.server_cli --server gunicorn --root / --log-file /seafile/logs/seafdav.log --pid /seafile/pids/seafdav.pid --port 8080 --host 0.0.0.0
root@1d53611e14f4:/seafile# pkill --signal SIGINT -f  'wsgidav.server.server_cli'
root@1d53611e14f4:/seafile# echo $?
0
root@1d53611e14f4:/seafile# ps -ef | grep -v grep | grep wsgidav

Is it possibly due to gunicorn itself handling INT, but not TERM signals?


At any rate, think I'll propose Seafile team to:

  1. stop SIGKILLing processes as the first step;
  2. consider SIGINT-ing webdav as opposed to TERM-ing

Although INT is a bit weird signal to send in this case, as afaik it's supposed to be keyboard/user interrupt, i.e. implies interactivity, not one system interrupting another.

laur89 commented 1 month ago

Worth noting following your example using version 4.3.3 I'm unable to reproduce the conditions where some child processes hang around. Killing via both TERM & KILL signals always result in all processes being reaped. Unsure what's going on under Seafile.