svenvc / zinc

Zinc HTTP Components is an open-source Smalltalk framework to deal with the HTTP networking protocol.
MIT License
97 stars 57 forks source link

Zinc server becomes unresponsive when the image is saved while a ‘system’ process is running #101

Open Rinzwind opened 2 years ago

Rinzwind commented 2 years ago

In the following example, the second #get: message signals a ConnectionTimedOut:

ZnServer startDefaultOn: 1701.
ZnClient new get: 'http://localhost:1701'.
LibC system: 'sleep 2 &'.
Smalltalk snapshot: true andQuit: false.
(Delay forSeconds: 3) wait.
ZnClient new get: 'http://localhost:1701'.

This seems to be because saving the image causes the Zinc server to try to set up a new socket through #initializeServerSocket, while the previous socket remains open in the child process created by the ‘system’ call.

Version information:

Rinzwind commented 2 years ago

I think the issue can be avoided by setting the FD_CLOEXEC flag on the socket file descriptor, but I’m not sure how to do that through Socket.

See the C program given below, which can be compiled without and with SET_CLOEXEC defined:

$ cc -o test1 test.c; cc -o test2 -DSET_CLOEXEC test.c

The exit status of test1 is 4:

$ ./test1; echo $?
4

While the exit status of test2 is 0:

$ ./test2; echo $?
0

test.c:

#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <fcntl.h>

int csocket() {
    int sfd = socket(AF_INET, SOCK_STREAM, 0);
#ifdef SET_CLOEXEC
    if (fcntl(sfd, F_SETFD, FD_CLOEXEC) == -1) return -1;
#endif
    if (sfd == -1) return -1;
    struct sockaddr_in sai;
    memset(&sai, 0, sizeof(sai));
    sai.sin_family = AF_INET;
    sai.sin_port = htons(1701);
    if (bind(sfd, (struct sockaddr *)&sai, sizeof(sai)) == -1) return -1;
    if (listen(sfd, 1) == -1) return -1;
    return sfd;
}

int main() {
    int sfd1 = csocket();
    if (sfd1 == -1) return 1;
    if (system("sleep 1 &") != 0) return 2;
    if (close(sfd1) == -1) return 3;
    int sfd2 = csocket();
    if (sfd2 == -1) return 4;
    return 0;
}
Rinzwind commented 2 years ago

Related issue with Socket: https://github.com/pharo-project/pharo/issues/11723

Rinzwind commented 2 years ago

The following obviously needs a cleaner implementation, but it shows that the issue can be avoided by setting FD_CLOEXEC on the Zinc server’s socket, the second #get: message no longer signals an error:

"Platform-dependent constants, values given are for macOS 11.7 (Intel):"
offsetPrivateSocketPtrInSQSocketStruct := 8.
offsetSInPrivateSocketStruct := 0.
sizeOfInt := 4.
valueFD_SET := 2.
valueFD_CLOEXEC := 1.

ZnServer startDefaultOn: 1701.
LibC uniqueInstance
    fcntl: ((ZnServer default serverSocket socketHandle
        pointerAtOffset: offsetPrivateSocketPtrInSQSocketStruct)
            integerAt: offsetSInPrivateSocketStruct + 1 size: sizeOfInt signed: true)
    cmd: valueFD_SET
    arg: valueFD_CLOEXEC.
ZnClient new get: 'http://localhost:1701'.
LibC system: 'sleep 2 &'.
Smalltalk snapshot: true andQuit: false.
(Delay forSeconds: 3) wait.
ZnClient new get: 'http://localhost:1701'.

With LibC>>#fcntl:cmd:arg: implemented as:

fcntl: fildes cmd: cmd arg: arg

    ^ self ffiCall: #(int fcntl(int fildes, int cmd, int arg))
Rinzwind commented 2 years ago

It’s maybe worth knowing that I mostly encounter(ed) this issue while using PTerm: in the example, the statement LibC system: 'sleep 2 &' can be replaced by TerminalEmulator openBash to get the same issue (with the same solution).