mighty-gerbils / gerbil

Gerbil Scheme
https://cons.io
GNU Lesser General Public License v2.1
1.15k stars 111 forks source link

gxensemble: fails to start when registry socket file exists #1106

Open chiefnoah opened 8 months ago

chiefnoah commented 8 months ago

When running the ensemble registry, if you use the default unix socket and stop the process and restart it, it fails with the following:

^^>>> gxensemble registry                                                                                13:57:33 

2024-01-21T19:57:36Z INFO ensemble starting registry ...
^C------------- REPL is now in #<thread #46 ticker> -------------
*** INTERRUPTED IN std/actor-v18/proto#ticker__%
^^>>> gxensemble registry                                                                           (70) 13:57:37 
*** ERROR IN std/actor-v18/server#start-actor-server!__% -- 
*** ERROR IN "os/socket.ss"@196.6-196.17 [OSError]: Unknown error -98
--- irritants: socket-bind #<input-port #46 (socket 3)> #<sockaddr* #47 0x5830a6ba3b20> 
--- continuation backtrace:
[0] raise                                                                              
[1] std/io/socket/socket#listen                                                                                                                                                (std/os/socket#socket-bind _sock333968_ _sockaddr333966_)
[2] std/io/socket/api#unix-listen__%                                                                                                                                           (std/io/socket/socket#listen _path349834_ _backlog349836_ _sockopts349838_)
[3] std/actor-v18/server#actor-server-listen!                                                                                                                                  (with-catch values __tmp43054)
[4] std/actor-v18/server#start-actor-server!__%                                                                                                                                (std/actor-v18/server#actor-server-listen! _addrs654541_ _tls-context654533_)
[5] std/actor-v18/api#call-with-ensemble-server__%                                                                                                                             (std/actor-v18/server#start-actor-server!__% '#f _server-id656029_ _tls-conte...
^^>>>                 

This is likely because socket-bind expects the file to not exist and fails at the OS level when it does.

There's a couple of different ways we can fix this:

If we go with the second option, we will need a second way to ensure we don't have multiple registry processes listening on the same socket.

chiefnoah commented 8 months ago

I think it would be best to clean up the socket on exit. @vyzo is there a good way to go about handling that?

chiefnoah commented 8 months ago

On further investigation, it seems the issue is in signal handling. What's the general philosophy for handling signals in Gerbil?

vyzo commented 8 months ago

we can install but i think the right way is to introduce some sort of exit handler -- some form of at-exit.