swoole / swoole-src

🚀 Coroutine-based concurrency library for PHP
https://www.swoole.com
Apache License 2.0
18.27k stars 3.16k forks source link

[phpredis] segment fault #3170

Closed Hetystars closed 4 years ago

Hetystars commented 4 years ago

Please answer these questions before submitting your issue. Thanks!

  1. What did you do? If possible, provide a simple script for reproducing the error. 进程执行到一半异常退出,报segment fault Valgrind log: ==8479== Memcheck, a memory error detector ==8479== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==8479== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info ==8479== Command: php bin/swoft elasticSearch:build aimchic ==8479== Parent PID: 8416 ==8479== ==8479== Conditional jump or move depends on uninitialised value(s) ==8479== at 0x1458B124: get_session_key_prefix (memcache.c:336) ==8479== by 0x1458B124: zm_activate_memcache (memcache.c:461) ==8479== by 0x8736CF: zend_activate_modules (zend_API.c:2539) ==8479== by 0x7FB208: php_request_startup (main.c:1698) ==8479== by 0x91533E: do_cli (php_cli.c:964) ==8479== by 0x4411D0: main (php_cli.c:1381) ==8479== ==8479== Warning: client switching stacks? SP change: 0x1ffeffca80 --> 0x1c1122a8 ==8479== to suppress, use: --max-stackframe=136951277528 or greater ==8479== Warning: client switching stacks? SP change: 0x1ffeffc5f0 --> 0x1c110450 ==8479== to suppress, use: --max-stackframe=136951284128 or greater ==8479== Warning: client switching stacks? SP change: 0x1c1104d0 --> 0x1ffeffc5f0 ==8479== to suppress, use: --max-stackframe=136951284000 or greater ==8479== further instances of this message will not be shown. ==8479== Invalid read of size 8 ==8479== at 0x817671: _php_stream_free (streams.c:372) ==8479== by 0x15247BD9: redis_sock_disconnect (library.c:2008) ==8479== by 0x1526BA9A: cluster_disconnect (cluster_library.c:1280) ==8479== by 0x1526BAFF: cluster_free (cluster_library.c:872) ==8479== by 0x152568D1: free_cluster_context (redis_cluster.c:339) ==8479== by 0x8ACC70: zend_objects_store_del (zend_objects_API.c:180) ==8479== by 0x8A709B: i_zval_ptr_dtor (zend_variables.h:48) ==8479== by 0x8A709B: zend_object_std_dtor (zend_objects.c:68) ==8479== by 0x8AC720: zend_objects_store_free_object_storage (zend_objects_API.c:99) ==8479== by 0x85A68B: shutdown_executor (zend_execute_API.c:363) ==8479== by 0x86CAD2: zend_deactivate (zend.c:1005) ==8479== by 0x7FBC66: php_request_shutdown (main.c:1902) ==8479== by 0x914CE1: do_cli (php_cli.c:1160) ==8479== Address 0x15bfd090 is 160 bytes inside a block of size 256 free'd ==8479== at 0x4C29CDD: free (vg_replace_malloc.c:530) ==8479== by 0x8178E7: _php_stream_free (streams.c:540) ==8479== by 0x817A7B: stream_resource_regular_dtor (streams.c:1619) ==8479== by 0x881A10: zend_resource_dtor (zend_list.c:76) ==8479== by 0x881A52: zend_close_rsrc (zend_list.c:230) ==8479== by 0x87F2D1: zend_hash_reverse_apply (zend_hash.c:1598) ==8479== by 0x85A7C0: shutdown_executor (zend_execute_API.c:353) ==8479== by 0x86CAD2: zend_deactivate (zend.c:1005) ==8479== by 0x7FBC66: php_request_shutdown (main.c:1902) ==8479== by 0x914CE1: do_cli (php_cli.c:1160) ==8479== by 0x4411D0: main (php_cli.c:1381) ==8479== Block was alloc'd at ==8479== at 0x4C28BE3: malloc (vg_replace_malloc.c:299) ==8479== by 0x83D988: zend_malloc (zend_alloc.c:2838) ==8479== by 0x816A34: _php_stream_alloc (streams.c:273) ==8479== by 0x14EDC980: socket_create(char const, unsigned long, char const, unsigned long, char const, int, int, timeval, _php_stream_context) (swoole_runtime.cc:998) ==8479== by 0x821984: _php_stream_xport_create (transports.c:133) ==8479== by 0x1524A171: redis_sock_connect (library.c:1923) ==8479== by 0x1526A06F: cluster_sock_open (cluster_library.c:268) ==8479== by 0x1526A06F: cluster_sock_write (cluster_library.c:1396) ==8479== by 0x1526BFA1: cluster_send_command (cluster_library.c:1610) ==8479== by 0x15256CAC: zim_RedisCluster_get (redis_cluster.c:532) ==8479== by 0x910D58: ZEND_DO_FCALL_SPEC_RETVAL_USED_HANDLER (zend_vm_execute.h:1097) ==8479== by 0x8B7302: execute_ex (zend_vm_execute.h:429) ==8479== by 0x14E9787B: swoole::PHPCoroutine::main_func(void) (swoole_coroutine.cc:752) ==8479== ==8479== Invalid read of size 4 ==8479== at 0x817681: _php_stream_free (streams.c:375) ==8479== by 0x15247BD9: redis_sock_disconnect (library.c:2008) ==8479== by 0x1526BA9A: cluster_disconnect (cluster_library.c:1280) ==8479== by 0x1526BAFF: cluster_free (cluster_library.c:872) ==8479== by 0x152568D1: free_cluster_context (redis_cluster.c:339) ==8479== by 0x8ACC70: zend_objects_store_del (zend_objects_API.c:180) ==8479== by 0x8A709B: i_zval_ptr_dtor (zend_variables.h:48) ==8479== by 0x8A709B: zend_object_std_dtor (zend_objects.c:68) ==8479== by 0x8AC720: zend_objects_store_free_object_storage (zend_objects_API.c:99) ==8479== by 0x85A68B: shutdown_executor (zend_execute_API.c:363) ==8479== by 0x86CAD2: zend_deactivate (zend.c:1005) ==8479== by 0x7FBC66: php_request_shutdown (main.c:1902) ==8479== by 0x914CE1: do_cli (php_cli.c:1160) ==8479== Address 0x15bfd098 is 168 bytes inside a block of size 256 free'd ==8479== at 0x4C29CDD: free (vg_replace_malloc.c:530) ==8479== by 0x8178E7: _php_stream_free (streams.c:540) ==8479== by 0x817A7B: stream_resource_regular_dtor (streams.c:1619) ==8479== by 0x881A10: zend_resource_dtor (zend_list.c:76) ==8479== by 0x881A52: zend_close_rsrc (zend_list.c:230) ==8479== by 0x87F2D1: zend_hash_reverse_apply (zend_hash.c:1598) ==8479== by 0x85A7C0: shutdown_executor (zend_execute_API.c:353) ==8479== by 0x86CAD2: zend_deactivate (zend.c:1005) ==8479== by 0x7FBC66: php_request_shutdown (main.c:1902) ==8479== by 0x914CE1: do_cli (php_cli.c:1160) ==8479== by 0x4411D0: main (php_cli.c:1381) ==8479== Block was alloc'd at ==8479== at 0x4C28BE3: malloc (vg_replace_malloc.c:299) ==8479== by 0x83D988: zend_malloc (zend_alloc.c:2838) ==8479== by 0x816A34: _php_stream_alloc (streams.c:273) ==8479== by 0x14EDC980: socket_create(char const, unsigned long, char const, unsigned long, char const, int, int, timeval, _php_stream_context) (swoole_runtime.cc:998) ==8479== by 0x821984: _php_stream_xport_create (transports.c:133) ==8479== by 0x1524A171: redis_sock_connect (library.c:1923) ==8479== by 0x1526A06F: cluster_sock_open (cluster_library.c:268) ==8479== by 0x1526A06F: cluster_sock_write (cluster_library.c:1396) ==8479== by 0x1526BFA1: cluster_send_command (cluster_library.c:1610) ==8479== by 0x15256CAC: zim_RedisCluster_get (redis_cluster.c:532) ==8479== by 0x910D58: ZEND_DO_FCALL_SPEC_RETVAL_USED_HANDLER (zend_vm_execute.h:1097) ==8479== by 0x8B7302: execute_ex (zend_vm_execute.h:429) ==8479== by 0x14E9787B: swoole::PHPCoroutine::main_func(void) (swoole_coroutine.cc:752) ==8479== ==8479== Invalid read of size 4 ==8479== at 0x81769D: _php_stream_free (streams.c:388) ==8479== by 0x15247BD9: redis_sock_disconnect (library.c:2008) ==8479== by 0x1526BA9A: cluster_disconnect (cluster_library.c:1280) ==8479== by 0x1526BAFF: cluster_free (cluster_library.c:872) ==8479== by 0x152568D1: free_cluster_context (redis_cluster.c:339) ==8479== by 0x8ACC70: zend_objects_store_del (zend_objects_API.c:180) ==8479== by 0x8A709B: i_zval_ptr_dtor (zend_variables.h:48) ==8479== by 0x8A709B: zend_object_std_dtor (zend_objects.c:68) ==8479== by 0x8AC720: zend_objects_store_free_object_storage (zend_objects_API.c:99) ==8479== by 0x85A68B: shutdown_executor (zend_execute_API.c:363) ==8479== by 0x86CAD2: zend_deactivate (zend.c:1005) ==8479== by 0x7FBC66: php_request_shutdown (main.c:1902) ==8479== by 0x914CE1: do_cli (php_cli.c:1160) ==8479== Address 0x15bfd070 is 128 bytes inside a block of size 256 free'd ==8479== at 0x4C29CDD: free (vg_replace_malloc.c:530) ==8479== by 0x8178E7: _php_stream_free (streams.c:540) ==8479== by 0x817A7B: stream_resource_regular_dtor (streams.c:1619) ==8479== by 0x881A10: zend_resource_dtor (zend_list.c:76) ==8479== by 0x881A52: zend_close_rsrc (zend_list.c:230) ==8479== by 0x87F2D1: zend_hash_reverse_apply (zend_hash.c:1598) ==8479== by 0x85A7C0: shutdown_executor (zend_execute_API.c:353) ==8479== by 0x86CAD2: zend_deactivate (zend.c:1005) ==8479== by 0x7FBC66: php_request_shutdown (main.c:1902) ==8479== by 0x914CE1: do_cli (php_cli.c:1160) ==8479== by 0x4411D0: main (php_cli.c:1381) ==8479== Block was alloc'd at ==8479== at 0x4C28BE3: malloc (vg_replace_malloc.c:299) ==8479== by 0x83D988: __zend_malloc (zend_alloc.c:2838) ==8479== by 0x816A34: _php_stream_alloc (streams.c:273) ==8479== by 0x14EDC980: socket_create(char const, unsigned long, char const, unsigned long, char const, int, int, timeval, _php_stream_context) (swoole_runtime.cc:998) ==8479== by 0x821984: _php_stream_xport_create (transports.c:133) ==8479== by 0x1524A171: redis_sock_connect (library.c:1923) ==8479== by 0x1526A06F: cluster_sock_open (cluster_library.c:268) ==8479== by 0x1526A06F: cluster_sock_write (cluster_library.c:1396) ==8479== by 0x1526BFA1: cluster_send_command (cluster_library.c:1610) ==8479== by 0x15256CAC: zim_RedisCluster_get (redis_cluster.c:532) ==8479== by 0x910D58: ZEND_DO_FCALL_SPEC_RETVAL_USED_HANDLER (zend_vm_execute.h:1097) ==8479== by 0x8B7302: execute_ex (zend_vm_execute.h:429) ==8479== by 0x14E9787B: swoole::PHPCoroutine::main_func(void) (swoole_coroutine.cc:752) ==8479== ==8479== ==8479== HEAP SUMMARY: ==8479== in use at exit: 417,946 bytes in 171 blocks ==8479== total heap usage: 10,419,058 allocs, 10,418,887 frees, 8,359,431,570 bytes allocated ==8479== ==8479== LEAK SUMMARY: ==8479== definitely lost: 4,384 bytes in 73 blocks ==8479== indirectly lost: 328,468 bytes in 24 blocks ==8479== possibly lost: 0 bytes in 0 blocks ==8479== still reachable: 85,094 bytes in 74 blocks ==8479== suppressed: 0 bytes in 0 blocks ==8479== Rerun with --leak-check=full to see details of leaked memory ==8479== ==8479== For counts of detected and suppressed errors, rerun with: -v ==8479== Use --track-origins=yes to see where uninitialised values come from ==8479== ERROR SUMMARY: 7 errors from 4 contexts (suppressed: 0 from 0)

  2. What did you expect to see?

  3. What did you see instead?

  4. What version of Swoole are you using (show your php --ri swoole)? woole

Swoole => enabled Author => Swoole Team team@swoole.com Version => 4.4.16 Built => Mar 6 2020 01:52:47 coroutine => enabled epoll => enabled eventfd => enabled signalfd => enabled cpu_affinity => enabled spinlock => enabled rwlock => enabled openssl => OpenSSL 1.0.2k-fips 26 Jan 2017 zlib => 1.2.11 mutex_timedlock => enabled pthread_barrier => enabled futex => enabled async_redis => enabled

Directive => Local Value => Master Value swoole.enable_coroutine => On => On swoole.enable_library => On => On swoole.enable_preemptive_scheduler => Off => Off swoole.display_errors => On => On swoole.use_shortname => On => On swoole.unixsock_buffer_size => 8388608 => 8388608

  1. What is your machine environment used (including version of kernel & php & gcc) ? php 7.1.3

Linux version 4.9.81-35.56.amzn1.x86_64 (mockbuild@gobi-build-64010) (gcc version 7.2.1 20170915 (Red Hat 7.2.1-2) (GCC) ) #1 SMP Fri Feb 16 00:18:48 UTC 2018

Hetystars commented 4 years ago

上个周五发现问题后,通过更新redis版本后,测试了4次,均没有复现。今天偶然发现,该问题出现,不是必现,偶尔出现

twose commented 4 years ago

这是进程退出的时候出现的 一般不影响业务 这里比较复杂 正在跟进修复

Hetystars commented 4 years ago

业务代码大致为两层foreach(mysql读库,curl请求,redis),目前发现的是执行到中间,停掉了,整个循环没有走完;有较好的解决方案吗

twose commented 4 years ago

@Hetystars 使用的是最新的phpredis吧, 是用的持久连接吗?

twose commented 4 years ago

看日志是在程序退出的时候发生的 并不是执行到一半

Hetystars commented 4 years ago

@twose 上面 Valgrind log是业务中的一个foreach单元,真实业务中有较多单元;redis 版本为5.2.0;连接用的是连接池;上次更新完redis版本后,segment fault变成偶现的了,之前是必现

Hetystars commented 4 years ago

@twose 有没有拆中的一个方案,线上每天有1000万+的请求依赖于此。每天手动检查脚本是否执行完全,工作量实在太大

twose commented 4 years ago

@Hetystars 使用的是pconnect吗? 是在Server中使用的吗?

Hetystars commented 4 years ago

不是pconnect,redisCluster,persistent设置为false;在server中调用

twose commented 4 years ago

根据日志推理了一下 没有推理出来 是否有可重现的简单代码

对于运行到一半退出了不知道该怎么理解? 是表现为coredump还是单纯的运行到一半就退出了呢

Hetystars commented 4 years ago

业务代码较大,暂不能定位到具体的位置;脚本的表现为运行到一半就退出了;是否有合适的方法去捕获底层的异常退出

twose commented 4 years ago

一般来说除了正常退出, 就只有被信号杀死的情况, 比如coredump就是SIG11, 异常退出无论如何都是有提示的, 你可以注意一下进程退出的退出码是否为0, 而且Swoole的Server中, 如果有工作进程异常退出, manager进程会有日志提示

matyhtf commented 4 years ago

程序末尾是否添加了 Swoole\Event::wait() 看样子是 shutdown 阶段出现的,非正常情况

Hetystars commented 4 years ago

当前swoole版本为4.4.16, 没有添加.代码如下 @matyhtf

public static function run(callable $callable, ...$args): bool
    {
        // Is coroutine to return
        if (self::id() > 0) {
            CLog::warning('Already is in coroutine, not need to use `run`!');
            return PhpHelper::call($callable, ... $args);
        }

        // >= 4.4
        if (ServerHelper::isGteSwoole44()) {
            $scheduler = new Scheduler;
            $scheduler->add($callable, ...$args);

            return $scheduler->start();
        }

        // < 4.4
        Coroutine::create($callable, ...$args);
        Event::wait();

        return true;
    }
twose commented 4 years ago

都应该使用scheduler的方式

matyhtf commented 4 years ago

@Hetystars 请再整理一下 代码,最好能提供一个最小可重现的 demo ,另外使用 valgrind 跟踪是否有内存错误,使用 Swoole\Coroutine\Scheduler 作为启动器