Open iluuu1994 opened 2 years ago
Hey, @iluuu1994 ππ» Thanks for the contribution and a very nice idea ππ»
Yeah, we had some internal discussion about that. I'm not a PHP dev, so, waiting for our PHP team, which is currently busy with the SF v3.
That's just a rough idea; I haven't done any testing to verify this will work.
We had a similar idea, and I am pretty sure that that approach will work from the RR's POV.
Hi @rustatian! Thanks for your very speedy reply! Great to know this is on the roadmap :slightly_smiling_face:
Luckily, I don't think there's anything in php-src that needs to change. Running PHP with opcache.enable_cli=1
will automatically create the given shared memory segment. The mapping of the segment is handled by the operating system when the process is forked. RR could then notify the master worker that a new worker needs to be spawned, the master worker would respond by fork
ing itself and sending back the child PID. This could happen somewhere in https://github.com/spiral/roadrunner-worker. The master workers (non-shared, request) memory would stay low, as it does not handle any HTTP requests, so it can always be used to create new workers that have gone over the memory limit).
If there's more information you need about PHP internals, I'm happy to help if I can! My e-mail is ilija.tovilo[at]me.com. Or on Twitter.
(And it just occurred to me that by "I'm not a PHP dev" you probably meant that you don't work on the PHP part of RR, but I decided not to delete my comment in case it provides any additional information).
Thank you ππ»
Hey guys ππ»
RR part will be ready in the v2.12.0
. The specs with the protocol to allocate new fork
-ed workers I'll put in the docs and share the link here to discuss additionally π
@rustatian That's fantastic to hear! Thank you for your continued dedication to RoadRunner :hearts: Were you able to observe improvements in memory consumption?
My pleasure β€οΈ
Were you able to observe improvements in memory consumption?
Not at the moment. For the v2.12.0
, I'm planning to finish with POC. Since I have 0-knowledge of PHP, I need to create an async worker in Rust
(guess why π), create a simple protocol, and test the RR part. Then, our PHP team will create a PHP master worker, and we will be able to see the results of our experiment π
Hey @iluuu1994 ππ»
As far as I understand, the PHP doesn't have a bundled fork syscall, only pcntl_fork
, am I right?
@rustatian I'm afraid so, yes π
Good π It's not a problem since our target platforms for this feature are UNIX platforms (Linux, macOS, WSL2, etc). I checked Ubuntu/Fedora and Arch, and they all have this extension enabled and included by default.
First tests:
First process - master process. (22M) The second and third - forks, which are connected to the RR via sockets. (8.6M)
Hey guys ππ», here are some updates from my side:
As we saw earlier, fork
-ing one worker is a promising technique, for memory consumption specifically.
However, we have some limitations from the PHP side:
wait4
syscall (pcntl_wait
) without blocking our master process.pcntl_wait
, but then, the parent process would have zombies in its process table.PID 1
(init
). So now, if we kill the child, it will not become a zombie. But in that case, we should kill our controller process on every child's reallocation. The master process will be a new PHP CLI process. And previous forks would not be the same as the current fork.But the good news is that this POC showed me, that we could significantly reduce memory usage (thank you very much @iluuu1994 ππ»). And we're already working on a secret project to support a similar scenario π
@rustatian Could SIGCHLD
help here (in combination with pcntl_wait(WNOHANG)
)?
I tried to do that (p.3 in my message). But it would require more complex logic; honestly, I don't want to overcomplicate the worker π’ with that solution (but you're right, that's possible to use the WNOHANG
flag to return immediately and then notify a parent with signals when child process dies). We're preparing a more elegant solution, which would be cross-platform.
Of course, if you have a more elegant solution, that's even better! Thank you :slightly_smiling_face:
Thank you for your involvement. I appreciate that ππ». If you check the @wolfy-j Twitter, you may guess about that solution π
Good old thread π One of the problems I faced when implementing this feature is impossible to wait for non-RR-child process (child of the master PHP process) from the RR. But in the Linux kernel 5.10 new syscall was introduced, and here is the sample Go program to wait for such processes to exit. I'll leave it here when I return to this feature:
package main
import (
"errors"
"log"
"syscall"
"golang.org/x/sys/unix"
)
const syscallPidfdOpen = 434
type pidFD int // file descriptor that refers to a process
func pidfdOpen(pid int, flags uint) (pidFD, error) {
fd, _, errno := syscall.Syscall(syscallPidfdOpen, uintptr(pid), uintptr(flags), 0)
if errno != 0 {
return 0, errno
}
return pidFD(fd), nil
}
func (fd pidFD) waitForExit() error {
fds := []unix.PollFd{{Fd: int32(fd), Events: unix.POLLIN}}
_, err := unix.Poll(fds, -1)
if err != nil {
return err
}
if fds[0].Events & unix.POLLIN != unix.POLLIN {
return errors.New("unexpected poll event")
}
// Process exited
return nil
}
func main() {
pid := 5768 // Example pid
pidfd, err := pidfdOpen(pid, 0)
if err != nil {
log.Fatalf("opening pid fd: %v\n", err)
}
defer syscall.Close(int(pidfd))
err = pidfd.waitForExit() // blocks until the process exits
if err != nil {
log.Fatalf("polling pid %d: %v\n", pid, err)
}
// Process exited
}
Just FYI folks, work on this ticket has been resumed.
We need to have a new transport to connect the child of our parent process to the RR. With the code snipped about this is possible to wait non-our-child process in a blocking manner. I also created a, let's say, experimental design of clearing the kernel process table of zombie processes. This is because we can't block the master worker with pcntl_wait/waitid
system calls. Instead, when the process is finished, RR will send a special request to the master worker (we don't need to do this when the workers are reallocated, because any child of it will be inherited by PID1) to make a pcntl_waitid(dead_pid_here)
syscall to clear the kernel process table. And voila, no blocking because the process is already dead π.
The last problem we need to solve with @wolfy-j is to completely redirect the stdin/out/err
process pipes via the new transport, because it's still not possible to easily read the non-our-child process pipes. We will probably use the unix
sockets for this.
Keeping you informed, your humble servant @rustatian π
@iluuu1994 what about making an option for Opcache to mmap to actual files, as opposed to always using MAP_ANONYMOUS
?
@MaxSem There are other mechanisms to share memory on Linux between processes (e.g. memfd_create
or shm_open
). The issue is that the file needs to be mapped to the same address for every process since the data structures in shared memory reference each other through user-space pointers. However, we cannot guarantee the same address will be available for other processes because any of the addresses that would belong to the shared memory segment could already have been allocated for that process for some other purpose. I'm not a Linux expert, maybe there are some tricks that could be used.
Hey hey guys ππ» Just a few notes on this:
There is no problem to open and use the shared memory segment (with POSIX shm_open
or older SystemV shmget
), ftruncate
it to the needed size and then mmap
it.
The main problem is that we don't have access to shared memory from PHP. I mean the php-src
(sources) shared memory address, which is allocated during the startup routine. Also, we can't set it via configuration or any mechanism from within the PHP code. The funny thing is that on Windows, due to the Windows platform limitation (no forks), the shared memory segment is the same for all PHP scripts (and can be configured via php.ini
).
So if we could somehow, say, point every script to the same shared memory key (using OpCache), then we wouldn't have to reinvent the wheel with forking PHP processes π But unfortunately we can't (or if someone knows some secret php.ini
configuration key or hidden PHP method...).
And that's because this ticket is about forking the PHP-CLI process with an initialized shared memory key that would be shared between the childs (we don't need to mmap
anything).
And here we come to the second problem: transport. Each process can easily communicate and wait (waitpid
) for its childs. But we need to communicate with our child process child. So, our childs child is not our child. So we can't apply the same rules to the childs child (oh my). With the Linux kernel 5.10
we can use a new system call: pid_fd_open
, so the first part of the problem is solved, we can wait for the childs child and thus send a special request to the master worker to use pcntl_waitpid
to remove the dead worker (zombie) from the kernel process table. Also, imagine our master_worker dies π
, ohh, then all its children will be inherited by PID1 (poor orphans) and we don't need to send the waitpid
requests to these PIDs, since PID1 will handle that. So, we need to track the child PIDs of the master worker.
Second part is: how do we communicate with the child (remember, this is not our childer)? I decided to write a Unix socket transport for this. It's completely independent from the current transport we have, but solves the problem pretty well. We would have almost the same speed as with pipes.
Where we are now: As I mentioned in the previous messages, I'm a complete noob in PHP π₯² So I'm waiting for our PHP team to help me write a PHP part. Golang part is pretty much done.
@rustatian Hello :wave::blush: I think what @MaxSem suggested is to solve the issue completely on php-src's side, without the need to fork
processes to attach to the shm segment. That could be difficult due to the reasons mentioned above. AFAIK, for the same reason, Opcache support on Windows is considered somewhat broken.
We're looping through a predefined list of addresses until we find one that is free. With opcache.mmap_base
(I assume that's the php.ini config you're referencing above) the configured segment needs to be free, or we fatal-error (if I read the code correctly). ASLR can increase the chances of this happening. In the past, I've heard the unofficial suggestion to use opcache.file_cache
and opcache.file_cache_only=1
on Windows instead. But again, I'm not an export on this topic so I might be missing some key information.
Hey @iluuu1994 ππ»π, yeah, got u. This would be a preferred way to solve the problem on the PHP side. The comment is legendary:
zend refused but did not remember the exact reason, pls add info here if one of you know why :)
π
Yes, I meant the opcache.mmap_base
option on Windows. Unfortunately we don't have the same option to redefine these predefined addresses for Linux π’
- We don't have threads in PHP :(
@rustatian Have you tried krakjoe/parallel
extension?
Hey @michael-rubel ππ» No, I don't π I don't even tried PHP, since I'm not a PHP dev π
@rustatian Ask a PHP dev from the team to dig this extension. It seems to work better than the old pthreads
extension. Maybe it's a new field for optimizations (spawn threads instead of processes under the hood?)
P.S. The extension philosophy was taken from Golang: https://www.php.net/manual/en/philosophy.parallel.php
@michael-rubel The idea is not about threads/processes. We have workers to fulfill this pattern. The idea was to have forks, which are not the same.
Hi guys. Maybe we can revive the idea of ββprocess forks? This will probably help to greatly reduce the cost of process creation and reduce the use of RAM. Looks relatively simple to implement, I found mentions of it in the PHP package.
This is our old experiment with @wolfy-j.
This problem will be solved by Rapira
, not by RR.
Just to keep everyone in this thread posted - I'm working on this. Not as fast as I initially thought, but this idea is not dead for sure. New Rapira
workers using approximately 5-7mb RSS on a cold start and ~10mb RSS with OpCache and after few minutes of work.
Plugin
No response
I have an idea!
Hi rr team! :wave::blush:
As far as I can tell, right now RoadRunner always spawns new php-cli processes per worker. When using opcache this has the significant limitation that each process will have it's own shared memory segment. Opcache caches scripts, classes, functions and also de-duplicates constant strings and arrays in this shared memory segment to be used by all processes. This segment is (usually) created with
mmap()
andMAP_SHARED|MAP_ANONYMOUS
. For this mechanism to be used in other workers they would need to befork
ed, or threads (ZTS) need to be used.This approach could reduce memory usage by multiple factors, depending on the number of workers. Another benefit is that workers could warm each others cache as they put compiled scripts into shm. A potential downside is that some locking is going on when shared memory is modified, and stability could be compromised if there are bugs in opcache corrupting shm bringing down all workers (although php-fpm would be affected here too which makes it much less likely).
A (seemingly) simple solution could be to have a master worker that accepts fork messages but doesn't handle any requests itself (to avoid accumulating memory leaks polluting newly forked workers). The new child process would become an actual worker and sends its PID back to rr and starts listening for messages. If the shared memory segment would fill up the master could be replaced and after it all workers. This is not something that should usually happen though, as shm should be configured to be big enough not to cause any restarts. That's just a rough idea, I haven't done any testing to verify this will work.
I'm currently only allowed to work on php-src itself. Let me know if this is something you're interested in working on, if not I might try something in my free time at some point.
Thanks again for rr!