mikaku / Fiwix

A UNIX-like kernel for the i386 architecture
https://www.fiwix.org
Other
401 stars 32 forks source link

Bad interprocess communication when using select() on UNIX sockets #90

Closed mikaku closed 5 months ago

mikaku commented 5 months ago

The implementation of select() on UNIX sockets seems a bit buggy. After #83 things were improved but there is still some bad communication between two processes using the select() system call.

The following are two programs downloaded from here, that help to test and see the problem.

Define the SOCKETNAME to just "mysocket" in both programs, and compile them.

How to test

Boot your FiwixOS and after login in console execute the server program. Then from two serial tty or two console tty execute the client program on each tty. Once you have the three programs running, go to one of the clients and type hello and press ENTER, you should see hello in the other client. Try the same multiple times and also from the other client. You'll see that what you type does not always appear in the other side.

If you want to have two serial lines under QEMU (ttyS0 and ttyS1) add the following lines:

       -chardev pty,id=pciserial \
       -device pci-serial,chardev=pciserial \
       -serial pty
ghaerr commented 5 months ago

I'm on macOS and I don't have a way of connecting to multiple pty's outside of QEMU, so I started QEMU using -serial stdio. After boot, editing /etc/inittab to run a login on /dev/ttyS0. Then, from the main console, I ran:

./pollsrv &
./pollcli

and on the serial login:

./pollcli

From there, whatever I typed from one client was displayed on the other, and vice versa. I could not get it to fail.

Thus, the ongoing issue may be related to reading from stdin on two serial ports, rather than /dev/tty1 and /dev/ttyS0. I don't know.

but there is still some bad communication between two processes using the select() system call.

Can you explain what you are seeing when you say "bad communication"? What exactly is not working?

Is it possible you can try testing using the above scenario with qemu -serial stdio? I am not sure how to debug when I cannot get to duplicate.

In similar fashion, it is mentioned that Fiwix occasionally hangs, until something is typed on another console. Can you describe that in more detail? I am wondering whether this might be related to the current problem, that is, that select is not working on certain TTY devices, but working on others. For instance, Nano-X is now working after #83, with the exception of a certain nano-X client. All other clients are operating normally without bugs being seen.

mikaku commented 5 months ago

You're right, it works as expected.

I've tested it using your mechanism (with the server in background) and it worked. Then I tested it using two consoles (tty2 and tty3) and it worked also. Finally I tested it using two serial devices (ttyS0 and ttyS1) and it worked again.

I don't know why it didn't work in my first test, perhaps I forgot to compile the kernel with your last patch. OK, false alarm.