open-telemetry / opentelemetry-php

The OpenTelemetry PHP Library
https://opentelemetry.io/docs/instrumentation/php/
Apache License 2.0
725 stars 182 forks source link

[opentelemetry-php-instrumentation] - SIGSEGV with OTEL extension #1174

Closed piotrantosik closed 1 year ago

piotrantosik commented 1 year ago

Hi, During production rollout I have an issue with sigserv in php-fpm. I'm not good in low-level PHP but with debugging I found a problem with OBJ_RELEASE function. Moving it to if condition solves my problem, commit:

https://github.com/piotrantosik/opentelemetry-php-instrumentation/commit/0f449da5f5fec976be67b9ae7dfbabfe00d15063

I test with my changes (installing extension via docker-php-extension-installer) and don't observe sigserv anymore.

WARNING: [pool www] child 12 exited on signal 11 (SIGSEGV - core dumped) after 0.871070 seconds from start

 php -v
PHP 8.0.22 (cli) (built: Aug  9 2022 22:16:41) ( NTS )
Copyright (c) The PHP Group
Zend Engine v4.0.22, Copyright (c) Zend Technologies
    with Zend OPcache v8.0.22, Copyright (c), by Zend Technologies
    with blackfire v1.86.4~linux-musl-x64-non_zts80, https://blackfire.io, by Blackfire

GDB trace (recompiled PHP with --enable-debug flag, base docker image: 8.0.22-fpm-alpine:3.16, but without Blackfire - i have some problems with debug flag & Blackfire):

#0  0x00007f0e99dbdbf7 in zend_gc_delref (p=0x118f41647530) at /usr/local/include/php/Zend/zend_types.h:1172
open-telemetry/opentelemetry-php-instrumentation#1  0x00007f0e99dbdf28 in zend_object_release (obj=0x118f41647530) at /usr/local/include/php/Zend/zend_objects_API.h:74
open-telemetry/opentelemetry-php-instrumentation#2  0x00007f0e99dbf17a in observer_end (execute_data=0x7f0e9aa157c0, retval=0x7f0e9aa15790, hooks=0x7f0e96f422b8) at /tmp/pear/temp/otel_instrumentation/otel_observer.c:271
open-telemetry/opentelemetry-php-instrumentation#3  0x00007f0e99dbf3ad in observer_end_handler (execute_data=0x7f0e9aa157c0, retval=0x7f0e9aa15790) at /tmp/pear/temp/otel_instrumentation/otel_observer.c:303
open-telemetry/opentelemetry-php-instrumentation#4  0x000055987aa06044 in zend_observer_fcall_end ()
open-telemetry/opentelemetry-php-instrumentation#5  0x000055987a9c224d in execute_ex ()
open-telemetry/opentelemetry-php-instrumentation#6  0x000055987a8f8cbd in zend_call_function ()
open-telemetry/opentelemetry-php-instrumentation#7  0x000055987a9dc081 in ?? ()
open-telemetry/opentelemetry-php-instrumentation#8  0x000055987a94deeb in ?? ()
open-telemetry/opentelemetry-php-instrumentation#9  0x000055987a9c1a11 in execute_ex ()
open-telemetry/opentelemetry-php-instrumentation#10 0x000055987a8f8cbd in zend_call_function ()
open-telemetry/opentelemetry-php-instrumentation#11 0x000055987a8f7dd3 in _call_user_function_impl ()
open-telemetry/opentelemetry-php-instrumentation#12 0x000055987a720572 in ?? ()
open-telemetry/opentelemetry-php-instrumentation#13 0x000055987a92a4a7 in zend_hash_apply ()
open-telemetry/opentelemetry-php-instrumentation#14 0x000055987a720867 in php_call_shutdown_functions ()
open-telemetry/opentelemetry-php-instrumentation#15 0x000055987a8680e7 in php_request_shutdown ()
open-telemetry/opentelemetry-php-instrumentation#16 0x000055987aa19354 in ?? ()
open-telemetry/opentelemetry-php-instrumentation#17 0x00007f0e9b54ca22 in ?? () from /lib/ld-musl-x86_64.so.1
open-telemetry/opentelemetry-php-instrumentation#18 0x00007f0e9b54c9fb in ?? () from /lib/ld-musl-x86_64.so.1
open-telemetry/opentelemetry-php-instrumentation#19 0x00007ffdc76948f0 in ?? ()

Trace without php debug flag:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007ff6b01b6516 in ?? () from /usr/local/lib/php/extensions/no-debug-non-zts-20200930/otel_instrumentation.so
open-telemetry/opentelemetry-php-instrumentation#1  0x000055646579fe4c in zend_observer_fcall_end ()
open-telemetry/opentelemetry-php-instrumentation#2  0x00005564657738a2 in execute_ex ()
open-telemetry/opentelemetry-php-instrumentation#3  0x00007ff6b0b50b9a in ?? () from /usr/local/lib/php/extensions/no-debug-non-zts-20200930/blackfire.so
open-telemetry/opentelemetry-php-instrumentation#4  0x00005564654322bb in ?? ()
open-telemetry/opentelemetry-php-instrumentation#5  0x000055646543327f in ?? ()
open-telemetry/opentelemetry-php-instrumentation#6  0x00007ff6b0b50b9a in ?? () from /usr/local/lib/php/extensions/no-debug-non-zts-20200930/blackfire.so
open-telemetry/opentelemetry-php-instrumentation#7  0x00005564654322bb in ?? ()
open-telemetry/opentelemetry-php-instrumentation#8  0x000055646543327f in ?? ()
open-telemetry/opentelemetry-php-instrumentation#9  0x00007ff6b0b50b9a in ?? () from /usr/local/lib/php/extensions/no-debug-non-zts-20200930/blackfire.so
open-telemetry/opentelemetry-php-instrumentation#10 0x00005564654322bb in ?? ()
open-telemetry/opentelemetry-php-instrumentation#11 0x000055646543327f in ?? ()
open-telemetry/opentelemetry-php-instrumentation#12 0x00007ff6b0b50b9a in ?? () from /usr/local/lib/php/extensions/no-debug-non-zts-20200930/blackfire.so
open-telemetry/opentelemetry-php-instrumentation#13 0x00005564654322bb in ?? ()
open-telemetry/opentelemetry-php-instrumentation#14 0x000055646543327f in ?? ()
open-telemetry/opentelemetry-php-instrumentation#15 0x00007ff6b0b50b9a in ?? () from /usr/local/lib/php/extensions/no-debug-non-zts-20200930/blackfire.so
open-telemetry/opentelemetry-php-instrumentation#16 0x00005564654322bb in ?? ()
open-telemetry/opentelemetry-php-instrumentation#17 0x000055646543327f in ?? ()
open-telemetry/opentelemetry-php-instrumentation#18 0x00007ff6b0b50b9a in ?? () from /usr/local/lib/php/extensions/no-debug-non-zts-20200930/blackfire.so
open-telemetry/opentelemetry-php-instrumentation#19 0x00005564654322bb in ?? ()
open-telemetry/opentelemetry-php-instrumentation#20 0x000055646543327f in ?? ()
open-telemetry/opentelemetry-php-instrumentation#21 0x00007ff6b0b50b9a in ?? () from /usr/local/lib/php/extensions/no-debug-non-zts-20200930/blackfire.so
open-telemetry/opentelemetry-php-instrumentation#22 0x00005564654322bb in ?? ()
open-telemetry/opentelemetry-php-instrumentation#23 0x000055646543327f in ?? ()
open-telemetry/opentelemetry-php-instrumentation#24 0x00007ff6b0b50b9a in ?? () from /usr/local/lib/php/extensions/no-debug-non-zts-20200930/blackfire.so
open-telemetry/opentelemetry-php-instrumentation#25 0x00005564654322bb in ?? ()
open-telemetry/opentelemetry-php-instrumentation#26 0x000055646543327f in ?? ()
open-telemetry/opentelemetry-php-instrumentation#27 0x00007ff6b0b50b9a in ?? () from /usr/local/lib/php/extensions/no-debug-non-zts-20200930/blackfire.so
open-telemetry/opentelemetry-php-instrumentation#28 0x0000556465705ed7 in zend_call_function ()
open-telemetry/opentelemetry-php-instrumentation#29 0x000055646578b3e0 in ?? ()
open-telemetry/opentelemetry-php-instrumentation#30 0x00007ff6b0b4d1c7 in ?? () from /usr/local/lib/php/extensions/no-debug-non-zts-20200930/blackfire.so
piotrantosik commented 1 year ago

I found the cause of the problem. My span exporter looks like this:

use Symfony\Component\HttpClient\HttpClient;
use Symfony\Component\HttpClient\Psr18Client;

$httpClient = new Psr18Client(
        HttpClient::create(
            defaultOptions: [
                'headers' => [
                ],
                'max_duration' => 1,
            ]
        )
    );

    $resource = ResourceInfoFactory::defaultResource();

    $spanExporter = (new SpanExporterFactory(new PsrTransportFactory(
        client: $httpClient, requestFactory: $httpClient, streamFactory: $httpClient
    )))->create();

    $tracerProvider = TracerProvider::builder()
        ->addSpanProcessor(
            (new BatchSpanProcessorBuilder($spanExporter))->build()
        )
        ->setResource($resource)
        ->setSampler(new ParentBased(new AlwaysOnSampler()))
        ->build();

    Sdk::builder()
        ->setTracerProvider($tracerProvider)
        ->setPropagator(TraceContextPropagator::getInstance())
        ->setAutoShutdown(true)
        ->buildAndRegisterGlobal();

I also have a hook for this HttpClient (simplified):

        hook(
            HttpClient::class,
            'request',
            pre: static function (
                HttpClientInterface $client,
                array $params,
                string $class,
                string $function,
                ?string $filename,
                ?int $lineno,
            ) use ($instrumentation): array {
                //code
            },
            post: static function (
                HttpClientInterface $client,
                array $params,
                ?ResponseInterface $response,
                ?\Throwable $exception,
            ): void {
                //code
            },
        );

And sigterm occurs if my otel collector returns other than 200 status code - then HttpClient throw an exception that broke php.

brettmc commented 1 year ago

Hi @piotrantosik thanks for the bug report. Could you submit your patch as a PR so that we can try it out? A test would be great too if you can reduce the problem down to something small (eg "hook a function that throws an exception", if that's indeed what is the issue). @pdelewski knows more about the extension and might be able to comment more.

pdelewski commented 1 year ago

@piotrantosik @brettmc For some reason I missed this issue. Would be great if we could prepare test for it. Will look at it later

piotrantosik commented 1 year ago

@pdelewski @brettmc I cannot reproduce this with a simple test, but I can reproduce it with Symfony app, here is repo: https://github.com/piotrantosik/otel-sigserv

steps to sigserv:

In docker logs you will see: otel-sigserv-php-1 | [14-Mar-2023 20:46:00] WARNING: [pool www] child 122 exited on signal 11 (SIGSEGV) after 47.455006 seconds from start

Hope this help you.

Making a request here https://github.com/piotrantosik/otel-sigserv/blob/main/public/index.php#L59 throws sigserv.

pdelewski commented 1 year ago

@piotrantosik Thank you, really good job. Will try to prepare test based on information you provided.

brettmc commented 1 year ago

47 seconds is a really long time to be processing a request. My guess is that the hook is being called on telemetry export's http call, and is somehow causing an infinite loop. I've seen something similar in https://github.com/open-telemetry/opentelemetry-php/issues/878 Out of interest, if you switched to say gRPC exporter, is there still a problem @piotrantosik

piotrantosik commented 1 year ago

@brettmc 47 seconds lived PHP-FPM process, a request is very fast, under 1s. GRPC is an option, unfortunately I don't have possibility to implement it now - i must keep HTTP.

bobstrecansky commented 1 year ago

@pdelewski is planning on writing a unit test for this. Assigned to him.

pdelewski commented 1 year ago

@piotrantosik interesting docker-compose build --pull --no-cache failing with the following error => ERROR [app_php 20/20] RUN set -eux; mkdir -p var/cache var/log; if [ -f composer.json ]; then composer dump-autoload --classmap-authoritative --no-dev; composer dump-env prod; composer run-script --no-dev post-install-cmd; chmod 1.1s Have not investigated that further. However I also tried to reproduce this locally without docker on following version of PHP

PHP 8.2.2 (cli) (built: Feb  5 2023 12:48:10) (NTS)
Copyright (c) The PHP Group
Zend Engine v4.2.2, Copyright (c) Zend Technologies
    with Zend OPcache v8.2.2, Copyright (c), by Zend Technologies

and most recent otelcol otelcorecol version 0.72.0-dev and cannot. I tried to trigger exception from HttpClient but so far I got

Stack trace:
#0 /Users/pdelewski/Projects/!/otel-sigserv/vendor/open-telemetry/api/Behavior/LogsMessagesTrait.php(70): OpenTelemetry\SDK\Trace\SpanProcessor\BatchSpanProcessor::doLog('error', 'Unhandled expor...', Array)
open-telemetry/opentelemetry-php-instrumentation#1 /Users/pdelewski/Projects/!/otel-sigserv/vendor/open-telemetry/sdk/Trace/SpanProcessor/BatchSpanProcessor.php(252): OpenTelemetry\SDK\Trace\SpanProcessor\BatchSpanProcessor::logError('Unhandled expor...', Array)
open-telemetry/opentelemetry-php-instrumentation#2 /Users/pdelewski/Projects/!/otel-sigserv/vendor/open-telemetry/sdk/Trace/SpanProcessor/BatchSpanProcessor.php(198): OpenTelemetry\SDK\Trace\SpanProcessor\BatchSpanProcessor->flush('shutdown', NULL)
open-telemetry/opentelemetry-php-instrumentation#3 /Users/pdelewski/Projects/!/otel-sigserv/vendor/open-telemetry/sdk/Trace/TracerSharedState.php(98): OpenTelemetry\SDK\Trace\SpanProcessor\BatchSpanProcessor->shutdown(NULL)
open-telemetry/opentelemetry-php-instrumentation#4 /Users/pdelewski/Projects/!/otel-sigserv/vendor/open-telemetry/sdk/Trace/TracerProvider.php(92): OpenTelemetry\SDK\Trace\TracerSharedState->shutdown(NULL)
open-telemetry/opentelemetry-php-instrumentation#5 /Users/pdelewski/Projects/!/otel-sigserv/vendor/open-telemetry/sdk/Common/Util/functions.php(36): OpenTelemetry\SDK\Trace\TracerProvider->shutdown()
open-telemetry/opentelemetry-php-instrumentation#6 [internal function]: OpenTelemetry\SDK\Trace\TracerProvider->OpenTelemetry\SDK\Common\Util\{closure}()
open-telemetry/opentelemetry-php-instrumentation#7 /Users/pdelewski/Projects/!/otel-sigserv/vendor/open-telemetry/sdk/Common/Util/functions.php(50): Closure->call(Object(OpenTelemetry\SDK\Trace\TracerProvider))
open-telemetry/opentelemetry-php-instrumentation#8 /Users/pdelewski/Projects/!/otel-sigserv/vendor/open-telemetry/sdk/Common/Util/ShutdownHandler.php(76): OpenTelemetry\SDK\Common\Util\{closure}()
open-telemetry/opentelemetry-php-instrumentation#9 [internal function]: OpenTelemetry\SDK\Common\Util\ShutdownHandler::OpenTelemetry\SDK\Common\Util\{closure}(Array)
open-telemetry/opentelemetry-php-instrumentation#10 {main}
  thrown in <b>/Users/pdelewski/Projects/!/otel-sigserv/vendor/open-telemetry/api/Behavior/LogsMessagesTrait.php</b> on line <b>44</b><br />

without segfault

piotrantosik commented 1 year ago

@pdelewski strange, on my macOS docker works without any problem. also if I understand correctly you have started otel collector, try without it - here is the point, collector is unavailable or returns a different response code than 200

pdelewski commented 1 year ago

@pdelewski strange, on my macOS docker works without any problem. also if I understand correctly you have started otel collector, try without it - here is the point, collector is unavailable or returns a different response code than 200

@piotrantosik So, I tried 3 different things:

In all cases I have not reproduced segfault (I'm getting above exception), but I have not give up yet

pdelewski commented 1 year ago

@piotrantosik BTW, that might be related to php version or other differences between environments

piotrantosik commented 1 year ago

@pdelewski That's why it's a shame that docker doesn't work. I used this base project, maybe see here: https://github.com/dunglas/symfony-docker/blob/main/docs/troubleshooting.md#editing-permissions-on-linux As I said, for me it works out of the box, without any problem...

bobstrecansky commented 1 year ago

I attempted to recreate this on my MacOS machine, and I got an error about bin/console not being available:

https://gist.github.com/bobstrecansky/a45fe2486bb396d9796e351d1775bead

pdelewski commented 1 year ago

@bobstrecansky Thx. That's the same error I'm getting. @piotrantosik Could you comment on that?

bobstrecansky commented 1 year ago

A cursory stack overflow search talked about symfony 2 vs 3?: https://stackoverflow.com/questions/34016449/could-not-open-input-file-bin-console-error-comes-when-try-to-run-the-symfony

piotrantosik commented 1 year ago

@pdelewski @bobstrecansky sorry, my fault - I have bin directory in global gitignore file, and is not committed 😞 I now pushed it to this repo, so please pull/fetch and try again. I apologize for this, don't see this missing file before..

bobstrecansky commented 1 year ago

After the update I was successfully able to run and try out your repo. @pdelewski - same for you?

pdelewski commented 1 year ago

@bobstrecansky Yes. Seems that I reproduced segfault. Thx @piotrantosik @bobstrecansky

NOTICE: PHP message: [error] Uncaught PHP Exception Symfony\Component\HttpKernel\Exception\NotFoundHttpException: "No route found for "GET https://localhost/favicon.ico" (from "https://localhost/")" at /srv/app/vendor/symfony/http-kernel/EventListener/RouterListener.php line 127
- -  29/Mar/2023:18:33:40 +0000 "GET /index.php" 404
[29-Mar-2023 18:33:40] WARNING: [pool www] child 25 exited on signal 11 (SIGSEGV) after 165.275614 seconds from start
bobstrecansky commented 1 year ago

Yup; I'm reading some online documentation around this sort of thing now. Tried building with PHP 7.4 and 8.0 without success.

pdelewski commented 1 year ago

I finally reproduced with debug build and complete stack information

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff5a07e44 in zend_gc_delref (p=0x37940d614a0) at /home/vagrant/php-bin/DEBUG/include/php/Zend/zend_types.h:1201
1201        ZEND_ASSERT(p->refcount > 0);
(gdb) bt
#0  0x00007ffff5a07e44 in zend_gc_delref (p=0x37940d614a0) at /home/vagrant/php-bin/DEBUG/include/php/Zend/zend_types.h:1201
open-telemetry/opentelemetry-php-instrumentation#1  0x00007ffff5a0815a in zend_object_release (obj=0x37940d614a0) at /home/vagrant/php-bin/DEBUG/include/php/Zend/zend_objects_API.h:74
open-telemetry/opentelemetry-php-instrumentation#2  0x00007ffff5a093b0 in observer_end (execute_data=0x7ffff58167c0, retval=0x7ffff5816790, hooks=0x7ffff59d5038) at /home/vagrant/opentelemetry-php-instrumentation/otel_observer.c:271
open-telemetry/opentelemetry-php-instrumentation#3  0x00007ffff5a095ed in observer_end_handler (execute_data=0x7ffff58167c0, retval=0x7ffff5816790) at /home/vagrant/opentelemetry-php-instrumentation/otel_observer.c:303
open-telemetry/opentelemetry-php-instrumentation#4  0x0000555555ca23c2 in zend_observer_fcall_end (execute_data=0x7ffff58167c0, return_value=0x7ffff5816790) at /home/vagrant/php-src/Zend/zend_observer.c:220
open-telemetry/opentelemetry-php-instrumentation#5  0x0000555555c5979e in execute_ex (ex=0x7ffff58161a0) at /home/vagrant/php-src/Zend/zend_vm_execute.h:56116
open-telemetry/opentelemetry-php-instrumentation#6  0x0000555555b8e343 in zend_call_function (fci=0x7fffffffc0b0, fci_cache=0x7fffffffc090) at /home/vagrant/php-src/Zend/zend_execute_API.c:908
open-telemetry/opentelemetry-php-instrumentation#7  0x0000555555c74fad in zim_Closure_call (execute_data=0x7ffff5816140, return_value=0x7ffff5816130) at /home/vagrant/php-src/Zend/zend_closures.c:182
open-telemetry/opentelemetry-php-instrumentation#8  0x0000555555be5749 in ZEND_DO_FCALL_SPEC_OBSERVER_HANDLER () at /home/vagrant/php-src/Zend/zend_vm_execute.h:1979
open-telemetry/opentelemetry-php-instrumentation#9  0x0000555555c58eae in execute_ex (ex=0x7ffff5816020) at /home/vagrant/php-src/Zend/zend_vm_execute.h:55792
open-telemetry/opentelemetry-php-instrumentation#10 0x0000555555b8e343 in zend_call_function (fci=0x7ffff58a2d00, fci_cache=0x7ffff58a2d40) at /home/vagrant/php-src/Zend/zend_execute_API.c:908
open-telemetry/opentelemetry-php-instrumentation#11 0x0000555555a2491f in user_shutdown_function_call (zv=0x7ffff58645c8) at /home/vagrant/php-src/ext/standard/basic_functions.c:1710
open-telemetry/opentelemetry-php-instrumentation#12 0x0000555555bc1642 in zend_hash_apply (ht=0x7ffff59c99c0, apply_func=0x555555a248d2 <user_shutdown_function_call>) at /home/vagrant/php-src/Zend/zend_hash.c:1862
open-telemetry/opentelemetry-php-instrumentation#13 0x0000555555a24bbb in php_call_shutdown_functions () at /home/vagrant/php-src/ext/standard/basic_functions.c:1771
open-telemetry/opentelemetry-php-instrumentation#14 0x0000555555b027fa in php_request_shutdown (dummy=0x0) at /home/vagrant/php-src/main/main.c:1804
open-telemetry/opentelemetry-php-instrumentation#15 0x0000555555d25b57 in php_cli_server_request_shutdown (server=0x555556977600 <server>, client=0x5555569ce9d0) at /home/vagrant/php-src/sapi/cli/php_cli_server.c:2138
open-telemetry/opentelemetry-php-instrumentation#16 0x0000555555d25f13 in php_cli_server_dispatch (server=0x555556977600 <server>, client=0x5555569ce9d0) at /home/vagrant/php-src/sapi/cli/php_cli_server.c:2206
open-telemetry/opentelemetry-php-instrumentation#17 0x0000555555d26a24 in php_cli_server_recv_event_read_request (server=0x555556977600 <server>, client=0x5555569ce9d0) at /home/vagrant/php-src/sapi/cli/php_cli_server.c:2526
open-telemetry/opentelemetry-php-instrumentation#18 0x0000555555d26e11 in php_cli_server_do_event_for_each_fd_callback (_params=0x7fffffffd060, fd=5, event=1) at /home/vagrant/php-src/sapi/cli/php_cli_server.c:2612
open-telemetry/opentelemetry-php-instrumentation#19 0x0000555555d2252e in php_cli_server_poller_iter_on_active (poller=0x555556977608 <server+8>, opaque=0x7fffffffd060, callback=0x555555d26bb7 <php_cli_server_do_event_for_each_fd_callback>) at /home/vagrant/php-src/sapi/cli/php_cli_server.c:869
open-telemetry/opentelemetry-php-instrumentation#20 0x0000555555d26eab in php_cli_server_do_event_for_each_fd (server=0x555556977600 <server>, rhandler=0x555555d268b8 <php_cli_server_recv_event_read_request>, whandler=0x555555d26a5f <php_cli_server_send_event>)
    at /home/vagrant/php-src/sapi/cli/php_cli_server.c:2630
open-telemetry/opentelemetry-php-instrumentation#21 0x0000555555d26f37 in php_cli_server_do_event_loop (server=0x555556977600 <server>) at /home/vagrant/php-src/sapi/cli/php_cli_server.c:2640
open-telemetry/opentelemetry-php-instrumentation#22 0x0000555555d2735f in do_cli_server (argc=6, argv=0x55555698ab10) at /home/vagrant/php-src/sapi/cli/php_cli_server.c:2770
open-telemetry/opentelemetry-php-instrumentation#23 0x0000555555d1ce26 in main (argc=6, argv=0x55555698ab10) at /home/vagrant/php-src/sapi/cli/php_cli.c:1370
brettmc commented 1 year ago

I'll release a new version of the extension soon (there is one more PR to be merged first), and you can test it out