pharo-project / pharo

Pharo is a dynamic reflective pure object-oriented language supporting live programming inspired by Smalltalk.
http://pharo.org
Other
1.21k stars 356 forks source link

Something weird in VirtualMachine>>#maxExternalSemaphores: #10156

Open svenvc opened 3 years ago

svenvc commented 3 years ago

The ExternalSemaphoreTable is an important resource since it holds 3 Semaphores per Socket (network connection). The size of ExternalSemaphoreTable determines how many such resources can be used concurrently.

VM parameter 49 holds this limit, it is accessed by VirtualMachine>>#maxExternalSemaphores. The default for Pharo 9 seems to be 256 currently.

There is a method VirtualMachine>>#maxExternalSemaphoresSilently: that can be used to set this limit higher. This seems to work.

Now, there are some limitations here: the number must be a power of 2, higher that the one set before and lower than 64K. BTW the number of file descriptors for files and sockets is also limited by the OS for its processes.

Also, while growing the table, the Semaphores become unavailable to the VM for a very short period of time, so they could miss signals, leading to IO issues.

For this reason, it seems that the design goal was to not allow this to grow during normal use, only during startup or upfront configuration.

Now comes the weird thing: the implementation/code of VirtualMachine>>#maxExternalSemaphores:

This is the method called when the current table is too small and when it wants to grow larger than the current VirtualMachine>>#maxExternalSemaphores.

The implementation calls #maxExternalSemaphoresSilently: in a forked thread, after signalling a SystemNotification with a full explanation. But why should this be forked in the first place ?

Then in the main thread, it signals an error ! Presumably because the design goals mentioned before.

These two actions are in conflict.

All this is probably the result of well meant evolution and refactoring, but it is quite confusing.

My first idea would be: let the caller raise the error and don't try raising the limit, remove the current implementation and replace it by what is now #maxExternalSemaphoresSilently:

svenvc commented 3 years ago

Comment by Henrik Sperre Johanson:

Hi Sven - you’re correct the main intent was to make it clear one should not really be setting this from the image during runtime - and make it as obvious as possible when it had.

The way I remember it, the Error was only raised if the image is in development mode - so a developer running into this, would have the limit raised, but also get a debugger (which is hard to ignore) explaining the issue, while in runtime mode, the SystemNotification would get handled by logging so it would at least be possible to spot that it had happened.

If there’s a less confusing way to achieve that it would be a good thing, but I’m not sure removing all warnings would be the best way, as the related i/o problems would be nearly impossible to determine the root cause of.

Why forked? No idea/can’t remember…

Cheers, Henry

svenvc commented 3 years ago

Another comment by Henrik Sperre Johanson:

believe Igor Stasenko made changes so signals could no longer get lost, but I’m not sure it ever got integrated in Cog…

https://forum.world.st/Issue-5965-in-pharo-Can-t-allocate-semaphores-VirtualMachine-maxExternalSemaphores-aSize-Not-enough--tp4631607p4632388.html

Maybe worth revisiting/reviewing? If it works, the limitation would be gone, and the image warnings could be removed with no ill side-effects.

Cheers, Henry

svenvc commented 3 years ago

Henrik Sperre Johanson:

Seems it used to be in the «old» PharoVM before the reconcilliation: https://forum.world.st/external-semaphores-again-tp4712934p4713396.html

Maybe it’s still possible to find it in the git history of the current pharovm repository?

Cheers, Henry

Ducasse commented 3 years ago

@tesonep did you see this issue?

svenvc commented 3 years ago

It would be good if someone from the VM team took a look and it would also be a good idea to recover Igor's code. It would be a waste if it got lost.

Other than that, this is not a high priority issue.

tesonep commented 3 years ago

I have seen the mails, and the original Thread, but the source code is not available anymore. By the description of the problem and checking the VM code there is a little possibility of losing signals, but only when executing the signal from another thread. Sockets does not suffer this issue, only it is an issue for external FFI calls in threads different than the VM thread. Even though, I will change the implementation to fix this issue.

On the other side, if the number of semaphores is too small, we can try to increase the number of them or change the Socket implementation to use a single semaphore per socket, what do you think @svenvc ?.

svenvc commented 3 years ago

Like I said: there is no immediate problem. The current defaults are reasonable, it is possible to raise the limit. It is all just a bit confusing and arcane. It could be improved.

Regarding the Socket implementation, I always wondered why 3 semaphores are needed. I can think of reasons why, but it all seems heavy. I would prefer to work on a totally new Socket implementation, much simpler/clearer, both image and VM side. I known this has been tried before (by Noury, IIRC).

daniels220 commented 1 year ago

@svenvc Yes, a totally new Socket implementation would be wonderful. Personally I would like to see it keep as much as possible in the image, using direct FFI calls with per-platform implementations living in separate classes or the like (either a family of Socket subclasses, or a family of FFI-library-with-smarts classes that provide a common naming convention for the major socket operations). There is an implementation in Dolphin Smalltalk that could form the basis of the Windows version, but it would depend on ThreadedFFI—is that stable enough to depend on at this point? Not that I have time to dive into rewriting sockets...hopefully TFFI will be stable enough to use at some point anyway.