roadrunner-server / roadrunner

🤯 High-performance PHP application server, process manager written in Go and powered with plugins
https://docs.roadrunner.dev
MIT License
7.9k stars 412 forks source link

[💡 FEATURE REQUEST]: Improve clarity of worker errors when starting a server #2020

Open Kaspiman opened 2 weeks ago

Kaspiman commented 2 weeks ago

Plugin

Server

I have an idea!

When my newbie was getting used to the RR, some problems arose that were difficult to diagnose.

Let's say i have typical RR config for production usage for Logs plugin:

logs:
    level: warn
    mode: production

and also a broken worker with a syntax error (for example).

Then the output will only include the EOF error without any details.

handle_serve_command: Function call error:
2024-10-06T18:27:36.449763889Z  serve error from the plugin *http.Plugin stopping execution, error: static_pool_allocate_workers: WorkerAllocate: EOF

A newbie has questions: why the server won't start and what is EOF?

The problem is that errors from the worker only come from the debug level. In production mode, it is difficult to immediately understand what happened. EOF is too unclear a reason. Also, an error in the EOF text was sent to the browser without any details.

What if RR write a more meaningful error message right away? It seems that the worker startup error should be more understandable and the error level should be.

Another small question. Do RR really use STDERR to exchange data frames? https://docs.roadrunner.dev/docs/php-worker/worker#communication-methods

rustatian commented 2 weeks ago

Hey @Kaspiman 👋 Unfortunately, it is not possible to determine the exact error, I understand that EOF is too general error, but usually, it points to some error with the PHP file, e.g.: no such file for example or some garbage data in the process pipes which prevents RR to allocate a PHP process. The case is, RR does not know about PHP at all. From the RR perspective, it is just a child process and if the runtime can't provide additional information, RR unfortunately can't obtain such an info by itself somewhere.

Also, an error in the EOF text was sent to the browser without any details.

This is kind of strange, because EOF usually indicates a startup problem. When RR is running, this is highly unlikely to see this error. If you don't move the worker and send a ./rr reset command.

Another small question. Do RR really use STDERR to exchange data frames?

RR uses stdin/stdout to communicate with the PHP process. STDERR used to obtain logs/diagnostic information from the PHP process.

rustatian commented 2 weeks ago

It would be nice if you attach to the ticket some examples of PHP workers + actual error messages + desired error messages + .rr.yaml configurations.

Kaspiman commented 2 weeks ago

Use Case â„–1:

Any broken php worker, for example with syntax error.

http:
    pool:
        num_workers: 1

logs:
    level: warn

Actual error:

[INFO] RoadRunner server started; version: 2024.2.1, buildtime: 2024-09-12T16:25:02+0000
[INFO] sdnotify: not notified
handle_serve_command: Function call error:
serve error from the plugin *http.Plugin stopping execution, error: static_pool_allocate_workers: WorkerAllocate: EOF

Desired error message with error level: Failed to allocate worker, check application logs with stacktrace from PHP.

Now this trace is visible only from the info level.

Logs with info level and broken php worker:

[INFO] RoadRunner server started; version: 2024.2.1, buildtime: 2024-09-12T16:25:02+0000
[INFO] sdnotify: not notified
INFO    server          Warning: Undefined variable $worker in /var/www/html/worker.php on line 20
 Fatal error: Uncaught TypeError: Spiral\RoadRunner\Http\PSR7Worker::__construct(): Argument #1 ($worker) must be of type Spiral\RoadRunner\WorkerInterface, null given, called in /var/www/html/worker.php on line 20 and defined in /var/www/html/vendor/spiral/roadrunner-http/src/PSR7Worker.php:42
 Stack trace:
 #0 /var/www/html/worker.php(20): Spiral\RoadRunner\Http\PSR7Worker->__construct(NULL, Object(Nyholm\Psr7\Factory\Psr17Factory), Object(Nyholm\Psr7\Factory\Psr17Factory), Object(Nyholm\Psr7\Factory\Psr17Factory))
 #1 {main}
   thrown in /var/www/html/vendor/spiral/roadrunner-http/src/PSR7Worker.php on line 42

 handle_serve_command: Function call error:
     serve error from the plugin *http.Plugin stopping execution, error: static_pool_allocate_workers: WorkerAllocate: EOF

Basically, we need to log the worker output at the error level unless otherwise stated.


Use Case â„–2:

debug = true and any broken php worker, for example with syntax error.

logs:
    level: warn

http:
    pool:
        debug: true

Logs:

[INFO] RoadRunner server started; version: 2024.2.1, buildtime: 2024-09-12T16:25:02+0000
[INFO] sdnotify: not notified
ERROR   http            execute {"start": "2024-10-07T07:32:12+0000", "elapsed": 9, "error": "EOF"}
ERROR   http            execute {"start": "2024-10-07T07:32:12+0000", "elapsed": 9, "error": "EOF"}

The server has started, and when processing one request, two EOF errors are written to the logs.

Error 500 and EOF message in browser.

Desired error message in logs with error level:Failed to allocate worker, check application logs with stacktrace from PHP.

Desired error message in browser: Failed to allocate worker, check application logs with/without (idk) stacktrace from PHP.

Kaspiman commented 2 weeks ago

Is it possible to rework this kind of error?

Typical error:

 handle_serve_command: Function call error:
     serve error from the plugin *http.Plugin stopping execution, error: static_pool_allocate_workers: WorkerAllocate: EOF

It contains a lot of information that is more useful to the developer than to the user.

Possible rework:

Error from the <PluginName> plugin, message: Failed to allocate worker, check application logs, code: static_pool_allocate_workers, stacktrace: ...
rustatian commented 2 weeks ago

Ok, so, if I understand correctly, the problem is in log level, because under the info log level you may clearly see what is the actual error. The error about Function call comes from the Endure, but definitely it is possible to add some link to docs or guideline of what to do next to see the error. Will add this to 2024.3 milestone.

rustatian commented 2 weeks ago

The actual problem here, is that we don't have a log level from PHP. Thus, all worker-related messages (RR can't distinguish between a just message and an error message, from the RR POV all that is text) are logged under the info log level. And when you use warn, obviously, all info messages are not displayed (and this is the problem).

A real solution I want to see here is a some configuration exchange from the PHP worker side to RR. Like in the Temporal plugin. RR on the init stage pings all workers, so it would be nice to exchange a configuration info between RR <-> PHP. Logging configuration is the first and obvious step.

Kaspiman commented 2 weeks ago

Is it possible to read all logs from an application with an error level if a specific level is not specified? This will also help in other cases, for example, when the output is erroneously sent to stdout.

rustatian commented 2 weeks ago

Is it possible to read all logs from an application with an error level if a specific level is not specified?

This is not a proper solution to a problem. Worker's log level should be passed from the PHP Worker, otherwise - that particular worker should be treated as invalid and re-created. If you would like to collaborate and create a PHP part, I would be happy to provide a protocol specs and RR implementation.

Kaspiman commented 2 weeks ago

This problem may not only be in the worker. What if someone has their own worker, and not the one offered by Spiral?

What if it is a script launched in the Service plugin or in the server.on_init section? We won't be able to set up a worker there, it simply doesn't exist.

I think it's possible to interpret any output from the worker as an error. This will force people to catch errors and use the Logs plugin.

rustatian commented 2 weeks ago

We do not support third-party workers. But all needs should be covered by the well-defined protocol. And from this point it doesn't actually matters, who created a worker.

I think it's possible to interpret any output from the worker as an error

This is possible, but logs are a very sensitive topic. And this topic should be well-discussed.

Kaspiman commented 2 weeks ago

So, let's discuss with community. Discord or https://github.com/orgs/roadrunner-server/discussions?

rustatian commented 2 weeks ago

You may create both, discussion on the GitHub and create a topic in the Discord forum.

image
Kaspiman commented 1 week ago

The actual problem here, is that we don't have a log level from PHP. Thus, all worker-related messages (RR can't distinguish between a just message and an error message, from the RR POV all that is text) are logged under the info log level. And when you use warn, obviously, all info messages are not displayed (and this is the problem).

A real solution I want to see here is a some configuration exchange from the PHP worker side to RR. Like in the Temporal plugin. RR on the init stage pings all workers, so it would be nice to exchange a configuration info between RR <-> PHP. Logging configuration is the first and obvious step.

Could you please provide more accurate description of that exchange? What should do PHP worker in startup phrase?

rustatian commented 1 week ago

Just a structure with some fields. Like version, desired log level and that's it at the moment I guess. You may take a look at the temporal PHP worker and worker info structure it sends to Go.