Closed CheAlex closed 1 year ago
Hey @CheAlex šš» The workflow worker is internal and exists in a single copy. It's not safe to restart WF often because PHP would have to restart currently running workflows due to lack of information about them (restart - clear state). We have fixed an internal PHP memory leak, so this shouldn't be the case here. Please update to the latest RR version (2023.1.5 and corresponding PHP-SDK: 2.5.1)
broken load balancing - basically only workflow worker works
It's not broken. Workflow workers handle all workflow activity, all activity coordination. So, it's expected that the load on a single WF worker in terms of exec's
would be greater than on activity workers. But this is a very light load, since the WF worker, as I mentioned before, is only a coordinator. The real workload would be on an activity worker's shoulders.
Since this is not a bug, I'm closing this issue, but not locking it. So you can continue to discuss/ask questions.
@rustatian Hi! Thanks for answers. After updating RR to 2023.1.5, the test project no longer has memory leaks
@rustatian Hi!
I modified the test project: https://github.com/CheAlex/temporalio-samples-php/tree/test-memory-leaking-and-cpu-sticking - it uses the latest version of php-sdk and RR
After running I got the following results:
php app.php simple-activity
:
+---------+-----------+---------+---------+---------+--------------------+
| PID | STATUS | EXECS | MEMORY | CPU% | CREATED |
+---------+-----------+---------+---------+---------+--------------------+
| 11224 | ready | 251 | 48 MB | 0.75 | 31 seconds ago |
| 11225 | ready | 251 | 48 MB | 0.75 | 31 seconds ago |
| 11226 | ready | 249 | 48 MB | 0.78 | 31 seconds ago |
| 11227 | ready | 249 | 47 MB | 0.75 | 31 seconds ago |
| 11240 | ready | 3,001 | 51 MB | 9.64 | 31 seconds ago |
+---------+-----------+---------+---------+---------+--------------------+
After the end of all workflows, the workflow worker CPU consumption reaches ~10%, and then slowly drops to zero - there is a CPU sticking effect.
php app.php simple-activity-exception
:
+---------+-----------+---------+---------+---------+--------------------+
| PID | STATUS | EXECS | MEMORY | CPU% | CREATED |
+---------+-----------+---------+---------+---------+--------------------+
| 10676 | ready | 500 | 59 MB | 1.34 | 38 seconds ago |
| 10677 | ready | 499 | 59 MB | 1.31 | 38 seconds ago |
| 10678 | ready | 501 | 59 MB | 1.34 | 38 seconds ago |
| 10679 | ready | 500 | 59 MB | 1.34 | 38 seconds ago |
| 10694 | ready | 3,414 | 119 MB | 50.68 | 38 seconds ago |
+---------+-----------+---------+---------+---------+--------------------+
After the end of all workflows, the workflow worker CPU consumption reaches ~50%, and then slowly drops to zero - there is a CPU sticking effect. The memory remains at 119 MB and does not fall.
php app.php simple-activity-timeout
:
+---------+-----------+---------+---------+---------+--------------------+
| PID | STATUS | EXECS | MEMORY | CPU% | CREATED |
+---------+-----------+---------+---------+---------+--------------------+
| 13339 | ready | 4 | 48 MB | 0.09 | 55 seconds ago |
| 13340 | ready | 4 | 47 MB | 0.09 | 55 seconds ago |
| 13341 | ready | 4 | 48 MB | 0.12 | 55 seconds ago |
| 13342 | ready | 4 | 48 MB | 0.12 | 55 seconds ago |
| 13355 | ready | 3,238 | 126 MB | 37.40 | 55 seconds ago |
+---------+-----------+---------+---------+---------+--------------------+
After the end of all workflows, the workflow worker CPU consumption reaches ~40%, and then slowly drops to zero - there is a CPU sticking effect. The memory remains at 126 MB and does not fall.
Is this normal, expected behavior?
Hey @CheAlex šš»
and then slowly drops to zero - there is a CPU sticking effect
This is how it should be š, no worries, but thanks for pointing this out.
You can easily get information about CPU core load. But if you try to calculate the same information about a process - you have to take into account other processes and how they affect CPU as well. And then calculate how much CPU time your process consumes. So you'll see this effect. CPU usage for the process increases, then slowly decreases (because it uses less CPU time).
This is not the same as CPU load. This is how much CPU time (%) that particular process is using.
If you open utilities like top
, btop
, htop
and check the total CPU core load (or load across cores), you'll see that the CPU load drops immediately after you stop loading your workers.
@rustatian and what about memory in the second and third cases?)
Regarding the memory:
RR uses /proc/<pid>/statm
to count RSS memory consumption per process, this method is inaccurate, but we don't have more accurate way. So, the memory consumption might be slightly over the limit, not by tens of megabytes, but still inaccurate. More info here:
(24) rss %ld
Resident Set Size: number of pages the process has
in real memory. This is just the pages which count
toward text, data, or stack space. This does not
include pages which have not been demand-loaded in,
or which are swapped out. This value is
inaccurate; see /proc/pid/statm below.
@rustatian checked, everything works as you said, thanks)
@CheAlex šš» You are welcome š
@rustatian
not by tens of megabytes
but in my examples just tens of megabytes: 119 MB, 126 MB
These are 2 different runs, it's not a measurement problem, you're running two different workflows, so the memory consumption is different. By "tens of megabytes" I meant OS-specific tools vs. RR measurement.
No duplicates š„².
What happened?
In the test application https://github.com/temporalio/samples-php, two related problems appear:
workflow
worker worksworkflow
workerIn the test application, running in docker, I added limits (
max_jobs
,ttl
,max_worker_memory
):these limits work correctly for
activity
workers and do not apply toworkflow
worker. As a result, theworkflow
worker lives forever, has memory leaks and does not restart automatically:I was looking for the ability to set limits for the
workflow
worker, but did not find it. Did I search badly?Also, according to the statistics, there is a clear bias in the load towards the
workflow
workerVersion
2.12.3
Relevant log output
No response