Open joanhey opened 1 year ago
Looking at the log it seems
php-unit: /usr/local/bin/docker-entrypoint.sh: Applying configuration /docker-entrypoint.d/nginx-unit.json
php-unit: 2022/11/23 16:50:03 [info] 21#21 "benchmark" prototype started
Unit was started with some initial config.
php-unit: 2022/11/23 16:50:03 [info] 105#105 "benchmark" application started
php-unit: /usr/local/bin/docker-entrypoint.sh: OK: HTTP response status code is '200'
php-unit: {
php-unit: "success": "Reconfiguration done."
php-unit: }
php-unit: /usr/local/bin/docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/...
php-unit: /usr/local/bin/docker-entrypoint.sh: Stopping Unit daemon after initial configuration...
php-unit: /usr/local/bin/docker-entrypoint.sh: Waiting for control socket to be removed...
php-unit: 2022/11/23 16:50:03 [alert] 21#21 sendmsg(20, -1, -1, 1) failed (104: Connection reset by peer)
After it was started it was reconfigured, then shutdown. All that happened within the space of a second.
From your point of view, what was the actual sequence of events?
The strange thing is that in local work. Show all that verbose output, but work.
Finish here and start to verify the tests:
When finish the tests and close the docker:
Before show the summary:
But not in github actions or the bench server.
Github actions log failing: https://github.com/TechEmpower/FrameworkBenchmarks/actions/runs/3567839471/jobs/5996024544#step:9:4375
It's failing from the last 2 versions of Nginx Unit. And sometimes work !!
Here are the logs from an old run, using unit_1.21.0
and php7.4
working without problems.
https://tfb-status.techempower.com/unzip/results.2021-01-13-13-38-00-107.zip/results/20201229183947/php-unit
And an actual, using FROM nginx/unit:1.28.0-php8.1
https://tfb-status.techempower.com/unzip/results.2022-11-25-12-16-16-174.zip/results/20221119184855/php-unit
A recent run that half worked, using FROM nginx/unit:1.27.0-php8.1
https://tfb-status.techempower.com/unzip/results.2022-11-13-10-29-24-097.zip/php-unit
A picture from the fortunes results, from this run.
But json, db and query tests failed the verification and the log show a lot of errors:
https://tfb-status.techempower.com/unzip/results.2022-11-13-10-29-24-097.zip/results/20221107183025/php-unit/run/php-unit.log
I updated for that to v1.28, but now never work.
When I send the PR for Nginx Unit 1.28 was passing the github actions tests. But never in the bench server, and later again failed in github actions.
https://github.com/TechEmpower/FrameworkBenchmarks/pull/7691
Github actions log working: https://github.com/TechEmpower/FrameworkBenchmarks/actions/runs/3447671893/jobs/5753927577#step:9:2154
As discussed on the community slack https://nginxcommunity.slack.com/archives/C02SS9UB85C/p1670008983875989?thread_ts=1669810946.221959&cid=C02SS9UB85C
This issue is related to #728 and friends. A manual patch of the docker-entrypoint.sh
file seems to fix the issue.
Linking the external issue as well.
https://github.com/TechEmpower/FrameworkBenchmarks/pull/7757#pullrequestreview-1203015505
As said, we will ship a fix for the entrypoint script with the next Unit release 1.29.
If work in the next run. I'll close the issue.
I will close it once the fix was merged if that's okay for you
Failed in the last run:
When finish the run, I'll send the logs.
Yes, please share the logs. The new images should have the new docker entrypoint script. In case this is another kind of issue we should Create a new one.
It's using the patch in v1.28, and looks like the same issue. https://tfb-status.techempower.com/unzip/results.2022-12-28-17-59-18-855.zip/results/20221222172047/php-unit/build/php-unit.log
I'll update to v1.29
Failed in the last run, using v1.29. https://github.com/TechEmpower/FrameworkBenchmarks/pull/7839
Sunit (scala) from @lolgab, also fail with v1.28 and the last commit with v1.29. https://github.com/TechEmpower/FrameworkBenchmarks/pull/7815 Sunit log with v1.29: https://tfb-status.techempower.com/unzip/results.2023-01-04-02-03-49-510.zip/snunit
The only framework using v1.28 and working is Fastapi (python): https://github.com/TechEmpower/FrameworkBenchmarks/tree/master/frameworks/Python/fastapi Log: https://tfb-status.techempower.com/unzip/results.2023-01-04-02-03-49-510.zip/results/20221229020235/fastapi-nginx-unit
It's the same issue. It is not necessary to open a new one.
Update, Fastapi (python) fail with v1.29 in the tests.
It's using the patch in v1.28, and looks like the same issue. https://tfb-status.techempower.com/unzip/results.2022-12-28-17-59-18-855.zip/results/20221222172047/php-unit/build/php-unit.log
I'll update to v1.29
This issue is different from the other once. It is not related to the control socket. Will have a look on it.
Update, Fastapi (python) fail with v1.29 in the tests.
This log has these lines in it:
fastapi-nginx-unit-orjson: 2022/12/21 02:57:25 [alert] 0#25 [unit] Python failed to call 'asyncio.get_event_loop' Notice: fastapi-nginx-unit-orjson: 2022/12/21 02:57:25 [notice] 24#24 app process 25 exited with code 1
Which seems like it's the same issue as outlined in #815 perhaps?
Yes indeed! This is exactly this issue. We debugged the Python code last night. 1.29 is working with Python 3.10.8. anything newer than this will currently fail. We are working on a fix and I will let you know once available. Thanks for the links.
Yes indeed! This is exactly this issue. We debugged the Python code last night. 1.29 is working with Python 3.10.8. anything newer than this will currently fail. We are working on a fix and I will let you know once available. Thanks for the links.
Any progress on this issue?
@micaelmalta (and anyone else) feel free to try the patch here https://github.com/nginx/unit/issues/815#issuecomment-1396132323
I have pushed a branch that contains the above fix, if that facilitates easier testing.
V1.29.1 still fail in the benchmark tests, almost with PHP. When finish the run, I'll send the logs.
@joanhey Thanks. We will have a look into this. More details on Monday.
Work but with a lot of errors.
Run logs:
With unit v1.31.0 also fail in the GH Actions tests in the bench.
FROM unit:1.31.0-php8.2
As now also file in the test, it's easier to test it isolated to fix the problem.
What is failing exactly?
I mean the only errors I'm seeing for php-unit are a bunch of socket connection errors
Verifying test db for php-unit caused an exception: HTTPConnectionPool(host='tfb-server', port=8080): Max retries exceeded with url: /dbraw.php (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff2bb167670>: Failed to establish a new connection: [Errno 111] Connection refused'))
FAIL for http://tfb-server:8080
Server did not respond to request
See https://github.com/TechEmpower/FrameworkBenchmarks/wiki/Project-Information-Framework-Tests-Overview#specific-test-requirements
--------------------------------------------------------------------------------
VERIFYING JSON
--------------------------------------------------------------------------------
Accessing URL http://tfb-server:8080/json.php:
Verifying test json for php-unit caused an exception: HTTPConnectionPool(host='tfb-server', port=8080): Max retries exceeded with url: /json.php (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff2bb166bc0>: Failed to establish a new connection: [Errno 111] Connection refused'))
FAIL for http://tfb-server:8080
Server did not respond to request
See https://github.com/TechEmpower/FrameworkBenchmarks/wiki/Project-Information-Framework-Tests-Overview#specific-test-requirements
--------------------------------------------------------------------------------
VERIFYING QUERY
--------------------------------------------------------------------------------
Accessing URL http://tfb-server:8080/dbquery.php?queries=2:
Verifying test query for php-unit caused an exception: HTTPConnectionPool(host='tfb-server', port=8080): Max retries exceeded with url: /dbquery.php?queries=2 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff2bb54be80>: Failed to establish a new connection: [Errno 111] Connection refused'))
FAIL for http://tfb-server:8080
Server did not respond to request
See https://github.com/TechEmpower/FrameworkBenchmarks/wiki/Project-Information-Framework-Tests-Overview#specific-test-requirements
--------------------------------------------------------------------------------
VERIFYING UPDATE
--------------------------------------------------------------------------------
Accessing URL http://tfb-server:8080/updateraw.php?queries=2:
Verifying test update for php-unit caused an exception: HTTPConnectionPool(host='tfb-server', port=8080): Max retries exceeded with url: /updateraw.php?queries=2 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff2bb533b80>: Failed to establish a new connection: [Errno 111] Connection refused'))
FAIL for http://tfb-server:8080
Server did not respond to request
See https://github.com/TechEmpower/FrameworkBenchmarks/wiki/Project-Information-Framework-Tests-Overview#specific-test-requirements
--------------------------------------------------------------------------------
VERIFYING FORTUNE
--------------------------------------------------------------------------------
Accessing URL http://tfb-server:8080/fortune.php:
Verifying test fortune for php-unit caused an exception: HTTPConnectionPool(host='tfb-server', port=8080): Max retries exceeded with url: /fortune.php (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff2bb56e3e0>: Failed to establish a new connection: [Errno 111] Connection refused'))
FAIL for http://tfb-server:8080
Server did not respond to request
See https://github.com/TechEmpower/FrameworkBenchmarks/wiki/Project-Information-Framework-Tests-Overview#specific-test-requirements
--------------------------------------------------------------------------------
VERIFYING PLAINTEXT
--------------------------------------------------------------------------------
Accessing URL http://tfb-server:8080/plaintext.php:
Verifying test plaintext for php-unit caused an exception: HTTPConnectionPool(host='tfb-server', port=8080): Max retries exceeded with url: /plaintext.php (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff2bb56ee00>: Failed to establish a new connection: [Errno 111] Connection refused'))
FAIL for http://tfb-server:8080
Server did not respond to request
See https://github.com/TechEmpower/FrameworkBenchmarks/wiki/Project-Information-Framework-Tests-Overview#specific-test-requirements
What is tfb-server
? Is that what's running Unit? Could it be a file descriptor exhaustion issue?
The last one is only a GH actions test: https://github.com/TechEmpower/FrameworkBenchmarks/actions/runs/6079079817/job/16491119686#step:9:5995
I think the problem is that Unit needs too much time to initialize all the forks(processes|workers).
Before Unit was working without problems in the benchmark. Later it start to fail in the benchmark, but now it also fail in the GitHub action test.
I think the problem is that Unit needs too much time to initialize all the forks(processes|workers).
Are you saying that Unit itself is failing somehow due to trying to start X amount of processes?
What is X in your case?
Do you have a simple reproducer config?
It's a benchmark that use 3 xeon servers, each with 24 cores.
We test the same PHP code with different runtimes and servers. The only failing is Unit, in the last versions of Unit.
You can download and run the tests locally. Before pass the GH action tests and fail only in the bench. But now fail in both.
I'm just trying to understand exactly what it is that's failing...
Let me try it another way.
1) How many PHP application processes are you trying to start?
2) Do they all start correctly?
I send it the link, but I pass it here:
{
"listeners": {
"*:8080": {
"pass": "applications/benchmark"
}
},
"applications": {
"benchmark": {
"type": "php",
"processes": 84,
"user": "www-data",
"group": "www-data",
"root": "/php/",
"options": {
"file": "/php/deploy/conf/php.ini"
},
"limits": {
"requests": 10000000
}
}
}
}
And when try to start:
And it was working without problems, till Unit v1.29.
Pre Unit v1.29
You can clone the repo. And later try it locally, because now also fail with the tests.
Only run in the root repo: ./tfb --mode verify --test php-unit
PD: only need docker
OK, so the issue started with 1.29.0...
If it was after that then I would maybe point to b0e2d9d0 ("Isolation: Switch to fork(2) & unshare(2) on Linux."), but it started failing before that landed.
Looking at the changes between 1.28.0 and 1.29.0, there's nothing that really stands out here.
There were a couple of PHP module updates, 48b6a7b3 ("PHP: Fixed php_module_startup() call for PHP 8.2.") and a0327445 ("PHP: allowed to specify URLs without a trailing '/'.")
The only obvious process related changes was the introduction of per application cgroup support, but you're not even using that so it should basically have no impact.
The thing I find confusing about the logs is it always looks like unit is shutdown before the benchmark even runs.
Does it work if you reduce the number of processes?
As a quick test, I was able to start 84 PHP application processes in < 1 second on a 2 vCPU VM.
2023/09/14 17:49:17 [info] 882#882 "php" prototype started
2023/09/14 17:49:17 [info] 883#883 "php" application started
2023/09/14 17:49:17 [info] 884#884 "php" application started
2023/09/14 17:49:17 [info] 885#885 "php" application started
2023/09/14 17:49:17 [info] 886#886 "php" application started
...
2023/09/14 17:49:17 [info] 964#964 "php" application started
2023/09/14 17:49:17 [info] 965#965 "php" application started
2023/09/14 17:49:17 [info] 966#966 "php" application started
I don't remember the exact Unit version than start to fail. But was just when I started this issue, so the version released before Nov 29, 2022.
I'll try if fail with less processes.
PD: perhaps only fail with docker.
Today I have some free time, and I try to pinpoint this issue.
The problem is that it needs too much time to start "Unit", and it isn't ready when start the verification test or any other CI command.
ENTRYPOINT
( start unit | /docker-entrypoint.d/{.sh, .js, .pem, .json} | kill unit )CMD
(start unit)The ENTRYPOINT
and CMD
are run after the build and need a lot of time, with ~70 processes it isn't fast enough to run the next CI command. The actual servers in the bench have a CPU with 56 threads.
ENTRYPOINT
and config with RUN
If we bypass the ENTRYPOINT
and config unit
with RUN
and tested with 168 processes is really fast, and the next CI command run without problems.
RUN unitd && \
curl -X PUT --data-binary @/php/deploy/nginx-unit.json --unix-socket \
/var/run/control.unit.sock http://localhost/config
ENTRYPOINT [ ]
CMD ["unitd", "--no-daemon", "--control", "unix:/var/run/control.unit.sock"]
Perhaps I need to kill unit
in the RUN
.
Welcome any feedback with another solution.
If we bypass the ENTRYPOINT and config unit with RUN and tested with 168 processes is really fast, and the next CI command run without problems.
Yes, that matches my previous comment where you can see starting Unit with a load load of PHP processes took < 1 second... so I suspect the problem is not Unit itself but around the machinery to start it...
I don't know what ENTRYPOINT
is doing but perhaps you can instrument things to see where the time going?
The ENTRYPOINT
https://github.com/nginx/unit/blob/master/pkg/docker/docker-entrypoint.sh
The log forENTRYPOINT
execution ( start unit | /docker-entrypoint.d/{.sh, .js, .pem, .json} | kill unit ) + CMD
(start unit again) execution, both executed after the container build:
https://tfb-status.techempower.com/unzip/results.2024-05-07-03-20-53-233.zip/results/20240501071818/php-unit/run/php-unit.log
We can also run the tests in local, and fail because use too much time with ~+70 processes.
Why need to start unit - config - kill - start again
, when unit configuration is dynamic ?
Or better, why not start with the config.json
in /docker-entrypoint.d/
directly ?
See previous discussion: https://github.com/nginx/unit/discussions/1114
The entrypoint script works well for simple configurations/applications but does hurt container startup time.
The reason for the start-config-kill-restart
process is because the Control API is the means of configuring Unit and so we need a running instance to actually do the configuration.
For my own Unit-based Docker images I take a similar approach with RUN
. The entrypoint script will notice that a configuration is already set and will not overwrite it. So you don't need to set it empty.
COPY unitconf.json init.json
RUN nohup /bin/sh -c "unitd --no-daemon --pid init.pid --log /dev/stdout --control unix:init.unit.sock &" && \
curl -fsX PUT -d@init.json --unix-socket init.unit.sock _/config && \
rm init.*
I have a proposal for unitd
to start in a non-daemonizing, "configuration mode" and read configuration from stdin. Which would also remove the dependency on having curl(1) in the container. I'll write an issue about it (edit: https://github.com/nginx/unit/issues/1274).
Hi, I added Nginx and Nginx Unit to the TechEmpower benchmark. But in the last versions it's failing with PHP: https://tfb-status.techempower.com/unzip/results.2022-11-25-12-16-16-174.zip/results/20221119184855/php-unit/run/php-unit.log
Old config without problems, and faster than Nginx: https://github.com/TechEmpower/FrameworkBenchmarks/blob/R20/frameworks/PHP/php/php-unit.dockerfile https://github.com/TechEmpower/FrameworkBenchmarks/blob/R20/frameworks/PHP/php/deploy/nginx-unit.json
Actual config with problems: https://github.com/TechEmpower/FrameworkBenchmarks/blob/master/frameworks/PHP/php/php-unit.dockerfile https://github.com/TechEmpower/FrameworkBenchmarks/blob/master/frameworks/PHP/php/deploy/nginx-unit.json
The last versions fail a lot.
Please, could anybody review and help with this problem. Thank you.
PD: Welcome help with the Nginx config too, only with the plaintext test. https://github.com/TechEmpower/FrameworkBenchmarks/blob/master/frameworks/C/nginx/nginx.conf
Be careful, as the bench changed a lot after the spectre/meldown patches. https://github.com/TechEmpower/FrameworkBenchmarks/issues/7321
If you need I'll send you the benchs results. And ask me for any doubt.