php / php-src

The PHP Interpreter
https://www.php.net
Other
38.14k stars 7.74k forks source link

php 8.1 and earlier crash immediately when compiled with Xcode 16 clang on macOS 15 #16168

Closed ryandesign closed 1 week ago

ryandesign commented 3 weeks ago

Description

Hi, I'm the maintainer of php in MacPorts. macOS 15 and Xcode 16 were recently released, and following a user report, I discovered that php 8.1 and earlier crash when compiled there. php 8.2 and later don't have this problem, so I'm hoping to discover what change in php 8.2 fixed this so that I can backport it to earlier versions.

The crash is observed during installation:

Generating phar.php
/bin/sh: line 1: 55243 Segmentation fault

For php 8.1, this crash is ignored and the installation process continues, but the installed executable crashes when trying to do anything (php, php -v, etc.). For php 8.0 and earlier, the crash stops the installation process.

From the MacPorts ticket, I've seen two different crash logs—one a segmentation fault with no backtrace showing that execution has passed to an illegal address 0x1e8:

Crashed Thread:        0  Dispatch queue: com.apple.main-thread

Exception Type:        EXC_BAD_ACCESS (SIGSEGV)
Exception Codes:       KERN_INVALID_ADDRESS at 0x00000000000001e8
Exception Codes:       0x0000000000000001, 0x00000000000001e8

Termination Reason:    Namespace SIGNAL, Code 11 Segmentation fault: 11
Terminating Process:   exc handler [75775]

VM Region Info: 0x1e8 is not in any region.  Bytes before following region: 4509498904
      REGION TYPE                    START - END         [ VSIZE] PRT/MAX SHRMOD  REGION DETAIL
      UNUSED SPACE AT START
--->  
      __TEXT                      10cc98000-10d683000    [  9.9M] r-x/r-x SM=COW  /opt/local/var/macports/*/php

Thread 0 Crashed::  Dispatch queue: com.apple.main-thread
0   ???                                          0x1e8 ???

Thread 0 crashed with X86 Thread State (64-bit):
  rax: 0x00007fd29780c900  rbx: 0x000000010e205000  rcx: 0x0000000000000003  rdx: 0x000000010e205020
  rdi: 0x000000010e205000  rsi: 0x000000010e200000  rbp: 0x000000000000001a  rsp: 0x00007ff7b3266be0
   r8: 0x000000010e205000   r9: 0x000000000000006c  r10: 0x00000000001ff800  r11: 0x0000000000000030
  r12: 0x0000000000000000  r13: 0x000000010d6989c0  r14: 0x00007ff7b3266c00  r15: 0x000000010ce78ecd
  rip: 0x00000000000001e8  rfl: 0x0000000000010202  cr2: 0x00000000000001e8

Logical CPU:     0
Error Code:      0x00000014 (no mapping for user instruction read)
Trap Number:     14

and the other an abort trap implicating an unnamed (inline?) function called from zend_hash_find:

Crashed Thread:        0  Dispatch queue: com.apple.main-thread

Exception Type:        EXC_CRASH (SIGABRT)
Exception Codes:       0x0000000000000000, 0x0000000000000000

Termination Reason:    Namespace SIGNAL, Code 6 Abort trap: 6
Terminating Process:   php [40488]

Application Specific Information:
abort() called

Thread 0 Crashed::  Dispatch queue: com.apple.main-thread
0   libsystem_kernel.dylib              0x7ff813d7db52 __pthread_kill + 10
1   libsystem_pthread.dylib             0x7ff813db7f85 pthread_kill + 262
2   libsystem_c.dylib                   0x7ff813cd8b19 abort + 126
3   libsystem_malloc.dylib              0x7ff813bd7ab1 malloc_vreport + 857
4   libsystem_malloc.dylib              0x7ff813bdb58b malloc_report + 151
5   php                                    0x10cf5e27d .LL31 + 203
6   php                                    0x10cef6498 zend_hash_find + 128
7   php                                    0x10cf752b1 lookup_class_ex + 312
8   php                                    0x10cf70823 zend_perform_covariant_type_check + 885
9   php                                    0x10cf75ae8 zend_do_perform_implementation_check + 554
10  php                                    0x10cf755f8 do_inheritance_check_on_method + 344
11  php                                    0x10cf722df do_interface_implementation + 520
12  php                                    0x10ceec276 zend_class_implements + 216
13  php                                    0x10cf5d6fd zend_register_weakref_ce + 316
14  php                                    0x10cf703be zend_register_default_classes + 34
15  php                                    0x10cef7ba5 zm_startup_core + 101
16  php                                    0x10cee9ff9 zend_startup_module_ex + 259
17  php                                    0x10ceea38f zend_startup_module_zval + 12
18  php                                    0x10cef55cd zend_hash_apply + 87
19  php                                    0x10ce87a9c php_module_startup + 2180
20  php                                    0x10cfbb90d php_cli_startup + 13
21  php                                    0x10cfb9666 main + 1356
22  dyld                                0x7ff813a2a2cd start + 1805

Thread 0 crashed with X86 Thread State (64-bit):
  rax: 0x0000000000000000  rbx: 0x0000000000000006  rcx: 0x00007ff7b3203dc8  rdx: 0x0000000000000000
  rdi: 0x0000000000000103  rsi: 0x0000000000000006  rbp: 0x00007ff7b3203df0  rsp: 0x00007ff7b3203dc8
   r8: 0x000000000000002e   r9: 0x0000000000000000  r10: 0x0000000000000000  r11: 0x0000000000000246
  r12: 0x0000000000000000  r13: 0x0000000000000050  r14: 0x0000000000000103  r15: 0x0000000000000016
  rip: 0x00007ff813d7db52  rfl: 0x0000000000000246  cr2: 0x0000000000000000

Logical CPU:     0
Error Code:      0x02000148 
Trap Number:     133

On one machine, running php produced this possibly helpful diagnostic:

php(54330,0x7ff8522ffbc0) malloc: *** error for object 0x2cf607000: pointer being freed was not allocated
php(54330,0x7ff8522ffbc0) malloc: *** set a breakpoint in malloc_error_break to debug

PHP Version

8.1.30

Operating System

macOS 15.0

cmb69 commented 3 weeks ago

macOS 15 and Xcode 16 were recently released

I assume that you don't have this environment; otherwise a git bisect should find the commit(s).

The first issue might be related to OPcache JIT.

iluuu1994 commented 3 weeks ago

I ran a quick test-build in CI but couldn't reproduce the issue: https://github.com/iluuu1994/php-src/actions/runs/11144438755/job/30971704251 (ignore the failing last step)

A bisect would indeed be helpful. Usually, opcache is disabled in CLI by default, and JIT is disabled entirely. It also seems to occur during startup.

@ryandesign Are these debug builds? If not, can you try that? As usual, ASan/UBSan would be useful. Also, is this a problem that only appeared on macOS 15.0? This sounds like a compiler issue.

I don't have a Mac, but I think @kocsismate and @devnexen do. They might be able to help.

kocsismate commented 2 weeks ago

I still have MacOS 14.4, so I check also check what happens before and after an upgrade 😎

ryandesign commented 2 weeks ago

I assume that you don't have this environment; otherwise a git bisect should find the commit(s).

I have reproduced the issue locally on a 13-year-old dual core machine but before I have it spend a day bisecting I thought I'd ask if the issue is familiar to anyone.

Are these debug builds? If not, can you try that?

They are not. I can try that.

As usual, ASan/UBSan would be useful.

I don't know what that is.

Also, is this a problem that only appeared on macOS 15.0? This sounds like a compiler issue.

So far I am only aware of the issue occurring with macOS 15.0 + Xcode 16.0, not on earlier macOS versions with earlier Xcode versions. Xcode 16.0 is also compatible with macOS 14 ≥ 14.5 but I have not tested whether the issue occurs then. I could not say at this point whether the defect is in the compiler, the OS, or php's code.

kocsismate commented 2 weeks ago

I also managed to reproduce the issue with a debug build + ASAN, running on MacOS 14.4 + XCode 16:

/Users/kocsismate/dev/php-src/ext/pcre/pcre2lib/sljit/sljitNativeARM_64.c:1890:46: runtime error: left shift of 3 by 30 places cannot be represented in type 'int'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /Users/kocsismate/dev/php-src/ext/pcre/pcre2lib/sljit/sljitNativeARM_64.c:1890:46 in 
Assertion failed: ((zval_gc_flags((p)->u.type_info) & ((1<<7)|(1<<8))) != (1<<7)), function zend_gc_delref, file zend_types.h, line 1210.

==85256==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x00010d50baff at pc 0x000104d3e89c bp 0x00016b5a4ff0 sp 0x00016b5a4fe8
READ of size 16 at 0x00010d50baff thread T0
    #0 0x1026b6898 in ffcps_0 pcre2_jit_neon_inc.h:265
    #1 0x1073f74c4  (<unknown module>)
    #2 0x10262a7c4 in php_pcre2_jit_match pcre2_jit_match.c:165
    #3 0x102a4d7e8 in php_pcre_replace_impl php_pcre.c:1664
    #4 0x102a4cf10 in php_pcre_replace php_pcre.c:1606
    #5 0x102a76fd8 in php_replace_in_subject php_pcre.c:2176
    #6 0x102a55e38 in preg_replace_common php_pcre.c:2314
    #7 0x102a544a0 in zif_preg_replace php_pcre.c:2371
    #8 0x104c80890 in ZEND_DO_ICALL_SPEC_RETVAL_USED_HANDLER zend_vm_execute.h:1297
    #9 0x10470a104 in execute_ex zend_vm_execute.h:55622
    #10 0x10470b8e0 in zend_execute zend_vm_execute.h:60188
    #11 0x104532ea4 in zend_execute_scripts zend.c:1857
    #12 0x103fa2328 in php_execute_script main.c:2551
    #13 0x105881e34 in do_cli php_cli.c:965
    #14 0x10587e444 in main php_cli.c:1367
    #15 0x19a43e0dc  (<unknown module>)

0x00010ae0c3ef is located 7 bytes after 104-byte region [0x00010ae0c380,0x00010ae0c3e8)
allocated by thread T0 here:
    #0 0x107d3f124 in wrap_malloc+0x94 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x53124)
    #1 0x1042b75c0 in __zend_malloc zend_alloc.c:3088
    #2 0x1042ab068 in tracked_malloc zend_alloc.c:2799
    #3 0x1042b62c4 in _malloc_custom zend_alloc.c:2459
    #4 0x1042b5e7c in _emalloc zend_alloc.c:2578
    #5 0x103c7920c in zend_string_alloc zend_string.h:152
    #6 0x103c59e60 in zend_string_init zend_string.h:174
    #7 0x103c5c444 in php_trim_int string.c:824
    #8 0x103c5d384 in php_do_trim string.c:855
    #9 0x103c5c560 in zif_trim string.c:862
    #10 0x104c80890 in ZEND_DO_ICALL_SPEC_RETVAL_USED_HANDLER zend_vm_execute.h:1297
    #11 0x10470a104 in execute_ex zend_vm_execute.h:55622
    #12 0x10470b8e0 in zend_execute zend_vm_execute.h:60188
    #13 0x104532ea4 in zend_execute_scripts zend.c:1857
    #14 0x103fa2328 in php_execute_script main.c:2551
    #15 0x105881e34 in do_cli php_cli.c:965
    #16 0x10587e444 in main php_cli.c:1367
    #17 0x19a43e0dc  (<unknown module>)

SUMMARY: AddressSanitizer: heap-buffer-overflow pcre2_jit_neon_inc.h:265 in ffcps_0
Shadow bytes around the buggy address:
  0x00010ae0c100: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x00010ae0c180: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x00010ae0c200: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x00010ae0c280: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x00010ae0c300: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x00010ae0c380: 00 00 00 00 00 00 00 00 00 00 00 00 00[fa]fa fa
  0x00010ae0c400: fa fa fa fa fa fa 00 00 00 00 00 00 00 00 00 00
  0x00010ae0c480: 00 00 00 00 fa fa fa fa fa fa fa fa 00 00 00 00
  0x00010ae0c500: 00 00 00 00 00 00 00 00 00 00 fa fa fa fa fa fa
  0x00010ae0c580: fa fa 00 00 00 00 00 00 00 00 00 00 00 00 00 03
  0x00010ae0c600: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
    #0 0x1008b2898 in ffcps_0 pcre2_jit_neon_inc.h:265
    #1 0x1055f34c4  (<unknown module>)
    #2 0x1008267c4 in php_pcre2_jit_match pcre2_jit_match.c:165
    #3 0x100c497e8 in php_pcre_replace_impl php_pcre.c:1664
    #4 0x100c48f10 in php_pcre_replace php_pcre.c:1606
    #5 0x100c72fd8 in php_replace_in_subject php_pcre.c:2176
    #6 0x100c51e38 in preg_replace_common php_pcre.c:2314
    #7 0x100c504a0 in zif_preg_replace php_pcre.c:2371
    #8 0x102e7c890 in ZEND_DO_ICALL_SPEC_RETVAL_USED_HANDLER zend_vm_execute.h:1297
    #9 0x102906104 in execute_ex zend_vm_execute.h:55622
    #10 0x1029078e0 in zend_execute zend_vm_execute.h:60188
    #11 0x10272eea4 in zend_execute_scripts zend.c:1857
    #12 0x10219e328 in php_execute_script main.c:2551
    #13 0x103a7de34 in do_cli php_cli.c:965
    #14 0x103a7a444 in main php_cli.c:1367
    #15 0x19a43e0dc  (<unknown module>)

Apparently, PCRE2 Lib is to blame... And I guess the difference between PHP 8.1- and PHP 8.2+ is Christoph's commit: https://github.com/kocsismate/php-src/commit/32cceb75bf5e42bebbf4d12bab4369924f1d6885 that updated the library.

iluuu1994 commented 2 weeks ago

@kocsismate Do you get no errors with --without-pcre-jit? If so, Ryan might try the same.

kocsismate commented 2 weeks ago

@iluuu1994 I have several extensions enabled, but I don't hava errors anymore.

ryandesign commented 2 weeks ago
/Users/kocsismate/dev/php-src/ext/pcre/pcre2lib/sljit/sljitNativeARM_64.c:1890:46: runtime error: left shift of 3 by 30 places cannot be represented in type 'int'

Based on the filename I assume this is arm64 code. My tests were on x86_64 so the problem is at least not specific to arm64.

Apparently, PCRE2 Lib is to blame... And I guess the difference between PHP 8.1- and PHP 8.2+ is Christoph's commit: kocsismate@32cceb7 that updated the library.

For php in MacPorts we use MacPorts pcre2, not the pcre2 bundled with php—at least that's what I intend for it to do. At least looking at the library linkage, it appears to have been successful:

% otool -L /opt/local/bin/php81
/opt/local/bin/php81:
    /usr/lib/libresolv.9.dylib (compatibility version 1.0.0, current version 1.0.0)
    /opt/local/lib/libncurses.6.dylib (compatibility version 6.0.0, current version 6.0.0)
    /opt/local/lib/libbz2.1.0.dylib (compatibility version 1.0.0, current version 1.0.8)
    /usr/lib/libnetwork.dylib (compatibility version 1.0.0, current version 1.0.0)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1351.0.0)
    /opt/local/lib/libxml2.2.dylib (compatibility version 16.0.0, current version 16.4.0)
    /opt/local/lib/libpcre2-8.0.dylib (compatibility version 14.0.0, current version 14.0.0)
    /opt/local/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.3.1)
    /opt/local/lib/libedit.0.dylib (compatibility version 1.0.0, current version 1.74.0)
    /opt/local/lib/libargon2.1.dylib (compatibility version 0.0.0, current version 0.0.0)
% 

All MacPorts php versions from php73 onward link with the same MacPorts pcre2 library, so that would not appear to explain why php82 and later work and php81 and earlier don't, unless php still uses part of its bundled pcre2 even when told to use an external pcre2.

@kocsismate Do you get no errors with --without-pcre-jit? If so, Ryan might try the same.

MacPorts pcre2 is built configured with --enable-jit. Rebuilding it with --disable-jit causes all of the php versions—both those that crashed before and those that worked—to fail with the same error:

% php81 -v

Fatal error: Unable to start pcre module in Unknown on line 0
% php82 -v

Fatal error: Unable to start pcre module in Unknown on line 0
% php83 -v

Fatal error: Unable to start pcre module in Unknown on line 0
%

I have not yet tried rebuilding any of the phps with the JITless pcre2 nor have I tried php's --without-pcre-jit configure option.

cmb69 commented 2 weeks ago

@kocsismate, ASan may report false positives for PCRE JIT code.

NattyNarwhal commented 2 weeks ago

FWIW, I can't seem to reproduce this on my system (M1 Pro, macOS 14.7, Xcode 16, MacPorts). Were you using a specific set of ./configure options?

ryandesign commented 2 weeks ago

Yes I also had problems reproducing this when trying to run git bisect doing builds outside of MacPorts until I added in configure options that are used in MacPorts.

I was initially using these flags and could not reproduce the problem:

--disable-all \
--disable-cgi \
--with-external-pcre=/opt/local \

I'm now reproducing the problem using:

--disable-all \
--disable-cgi \
--disable-fpm \
--enable-bcmath \
--enable-ctype \
--enable-dom \
--enable-filter \
--enable-libxml \
--enable-pdo \
--enable-session \
--enable-simplexml \
--enable-tokenizer \
--enable-xml \
--enable-xmlreader \
--enable-xmlwriter \
--with-bz2=/opt/local \
--with-external-pcre=/opt/local \
--with-libxml \
--with-mhash=/opt/local \
--with-zlib=/opt/local \

I have not tried to figure out which of those flags make the problem appear because it takes so very long for this old machine to build php.

ryandesign commented 2 weeks ago

Also I am setting CFLAGS="-Os -Wno-error=implicit-function-declaration -Wno-error=implicit-int -Wno-error=incompatible-function-pointer-types -Wno-error=int-conversion". I tried --enable-debug initially but when I couldn't reproduce the problem I noticed it was setting -O0 so I removed it, thinking that might be hiding the problem. I used -Os because that is what MacPorts uses. After that, I found that I needed to add the additional configure flags mentioned above to reproduce the problem.

I found a faster machine to run git bisect on and it identified 6339938c7e289cb5a1c0bf8f37d96537301d5477 as the commit that fixed the problem.

ryandesign commented 2 weeks ago

Backporting that on top of 8.1.30, it no longer crashes:

index 1da3ce5248..ed1d0fc8f5 100644
--- a/Zend/zend_string.c
+++ b/Zend/zend_string.c
@@ -366,6 +366,7 @@ ZEND_API void zend_interned_strings_switch_storage(bool request)
        }
 }

+#if defined(__GNUC__) && (defined(__i386__) || (defined(__x86_64__) && !defined(__ILP32__)))
 /* Even if we don't build with valgrind support, include the symbol so that valgrind available
  * only at runtime will not result in false positives. */
 #ifndef I_REPLACE_SONAME_FNNAME_ZU
@@ -373,29 +374,20 @@ ZEND_API void zend_interned_strings_switch_storage(bool request)
 #endif

 /* See GH-9068 */
-#if defined(__GNUC__) && (__GNUC__ >= 11 || defined(__clang__)) && __has_attribute(no_caller_saved_registers)
-# define NO_CALLER_SAVED_REGISTERS __attribute__((no_caller_saved_registers))
-# ifndef __clang__
-#  pragma GCC push_options
-#  pragma GCC target ("general-regs-only")
-#  define POP_OPTIONS
-# endif
+#if __has_attribute(noipa)
+# define NOIPA __attribute__((noipa))
 #else
-# define NO_CALLER_SAVED_REGISTERS
+# define NOIPA
 #endif

-ZEND_API bool ZEND_FASTCALL NO_CALLER_SAVED_REGISTERS I_REPLACE_SONAME_FNNAME_ZU(NONE,zend_string_equal_val)(zend_string *s1, zend_string *s2)
+ZEND_API bool ZEND_FASTCALL I_REPLACE_SONAME_FNNAME_ZU(NONE,zend_string_equal_val)(zend_string *s1, zend_string *s2)
 {
        return !memcmp(ZSTR_VAL(s1), ZSTR_VAL(s2), ZSTR_LEN(s1));
 }
-
-#ifdef POP_OPTIONS
-# pragma GCC pop_options
-# undef POP_OPTIONS
 #endif

 #if defined(__GNUC__) && defined(__i386__)
-ZEND_API bool ZEND_FASTCALL zend_string_equal_val(zend_string *s1, zend_string *s2)
+ZEND_API zend_never_inline NOIPA bool ZEND_FASTCALL zend_string_equal_val(zend_string *s1, zend_string *s2)
 {
        char *ptr = ZSTR_VAL(s1);
        size_t delta = (char*)s2 - (char*)s1;
@@ -433,7 +425,7 @@ ZEND_API bool ZEND_FASTCALL zend_string_equal_val(zend_string *s1, zend_string *
 }

 #elif defined(__GNUC__) && defined(__x86_64__) && !defined(__ILP32__)
-ZEND_API bool ZEND_FASTCALL zend_string_equal_val(zend_string *s1, zend_string *s2)
+ZEND_API zend_never_inline NOIPA bool ZEND_FASTCALL zend_string_equal_val(zend_string *s1, zend_string *s2)
 {
        char *ptr = ZSTR_VAL(s1);
        size_t delta = (char*)s2 - (char*)s1;
iluuu1994 commented 2 weeks ago

@ryandesign Thank you! I'm assuming the zend_never_inline NOIPA on zend_string_equal_val alone fixes the issue? The I_REPLACE_SONAME_FNNAME_ZU is used exclusively for Valgrind. Do you have LTO enabled by any chance? This does seems like a strange issue.

  1. The function is declared in the C file, so without LTO it should never be inlined or optimized from the caller (e.g. by omitting preservation of scratch registers).
  2. Even if it were, the __asm__ block indicates which registers need to be preserved. So unless the compiler is buggy, it shouldn't omit them.
ryandesign commented 2 weeks ago

I'm assuming the zend_never_inline NOIPA on zend_string_equal_val alone fixes the issue?

Yes, just this change to 8.1.30 fixes it:

diff --git a/Zend/zend_string.c b/Zend/zend_string.c
index 1da3ce5248..c65e880850 100644
--- a/Zend/zend_string.c
+++ b/Zend/zend_string.c
@@ -394,8 +394,14 @@ ZEND_API bool ZEND_FASTCALL NO_CALLER_SAVED_REGISTERS I_REPLACE_SONAME_FNNAME_ZU
 # undef POP_OPTIONS
 #endif

+#if __has_attribute(noipa)
+# define NOIPA __attribute__((noipa))
+#else
+# define NOIPA
+#endif
+
 #if defined(__GNUC__) && defined(__i386__)
-ZEND_API bool ZEND_FASTCALL zend_string_equal_val(zend_string *s1, zend_string *s2)
+ZEND_API zend_never_inline NOIPA bool ZEND_FASTCALL zend_string_equal_val(zend_string *s1, zend_string *s2)
 {
        char *ptr = ZSTR_VAL(s1);
        size_t delta = (char*)s2 - (char*)s1;
@@ -433,7 +439,7 @@ ZEND_API bool ZEND_FASTCALL zend_string_equal_val(zend_string *s1, zend_string *
 }

 #elif defined(__GNUC__) && defined(__x86_64__) && !defined(__ILP32__)
-ZEND_API bool ZEND_FASTCALL zend_string_equal_val(zend_string *s1, zend_string *s2)
+ZEND_API zend_never_inline NOIPA bool ZEND_FASTCALL zend_string_equal_val(zend_string *s1, zend_string *s2)
 {
        char *ptr = ZSTR_VAL(s1);
        size_t delta = (char*)s2 - (char*)s1;

And that's easier to backport to earlier versions. This works for 8.0.30:

diff --git a/Zend/zend_string.c b/Zend/zend_string.c
index 570aeece61..4beb3050c6 100644
--- a/Zend/zend_string.c
+++ b/Zend/zend_string.c
@@ -324,8 +324,14 @@ ZEND_API void zend_interned_strings_switch_storage(zend_bool request)
    }
 }

+#if __has_attribute(noipa)
+# define NOIPA __attribute__((noipa))
+#else
+# define NOIPA
+#endif
+
 #if defined(__GNUC__) && defined(__i386__)
-ZEND_API zend_bool ZEND_FASTCALL zend_string_equal_val(zend_string *s1, zend_string *s2)
+ZEND_API zend_never_inline NOIPA zend_bool ZEND_FASTCALL zend_string_equal_val(zend_string *s1, zend_string *s2)
 {
    char *ptr = ZSTR_VAL(s1);
    size_t delta = (char*)s2 - (char*)s1;
@@ -363,7 +369,7 @@ ZEND_API zend_bool ZEND_FASTCALL zend_string_equal_val(zend_string *s1, zend_str
 }

 #ifdef HAVE_VALGRIND
-ZEND_API zend_bool ZEND_FASTCALL I_WRAP_SONAME_FNNAME_ZU(NONE,zend_string_equal_val)(zend_string *s1, zend_string *s2)
+ZEND_API zend_never_inline NOIPA zend_bool ZEND_FASTCALL I_WRAP_SONAME_FNNAME_ZU(NONE,zend_string_equal_val)(zend_string *s1, zend_string *s2)
 {
    size_t len = ZSTR_LEN(s1);
    char *ptr1 = ZSTR_VAL(s1);
@@ -393,7 +399,7 @@ ZEND_API zend_bool ZEND_FASTCALL I_WRAP_SONAME_FNNAME_ZU(NONE,zend_string_equal_
 #endif

 #elif defined(__GNUC__) && defined(__x86_64__) && !defined(__ILP32__)
-ZEND_API zend_bool ZEND_FASTCALL zend_string_equal_val(zend_string *s1, zend_string *s2)
+ZEND_API zend_never_inline NOIPA zend_bool ZEND_FASTCALL zend_string_equal_val(zend_string *s1, zend_string *s2)
 {
    char *ptr = ZSTR_VAL(s1);
    size_t delta = (char*)s2 - (char*)s1;
@@ -431,7 +437,7 @@ ZEND_API zend_bool ZEND_FASTCALL zend_string_equal_val(zend_string *s1, zend_str
 }

 #ifdef HAVE_VALGRIND
-ZEND_API zend_bool ZEND_FASTCALL I_WRAP_SONAME_FNNAME_ZU(NONE,zend_string_equal_val)(zend_string *s1, zend_string *s2)
+ZEND_API zend_never_inline NOIPA zend_bool ZEND_FASTCALL I_WRAP_SONAME_FNNAME_ZU(NONE,zend_string_equal_val)(zend_string *s1, zend_string *s2)
 {
    size_t len = ZSTR_LEN(s1);
    char *ptr1 = ZSTR_VAL(s1);

I haven't asked for LTO to be enabled but I don't know if LTO is enabled by default.

I cannot say whether the compiler is buggy. If you believe it is and can explain how, please file a bug report with Apple.

iluuu1994 commented 2 weeks ago

I can't really comment on this, unfortunately. I can't reproduce the issue since I don't have a Mac. It would be interesting to know how the assembly changes between the two versions. I do suspect a compiler issue, most likely due to register clobbering. But I can't say for sure. Now, what should we do with this issue? Since Sequoia was just released ~3 weeks ago, it seems quite likely that the issue is related to the new release and will be fixed in the future, at least if somebody else runs into it. Since I don't have a Mac, I cannot report it. I still wouldn't object to backporting this change, although it would only be released in the next security release, whenever that is.

NattyNarwhal commented 2 weeks ago

I wonder if it might be an issue with the x86 fast path that noipa is covering up. I was talking to a friend who deals with miscompile bugs frequently; size_t delta = (char*)s2 - (char*)s1; in particular looks kind of suspect (shouldn't that be a ptrdiff_t?).

iluuu1994 commented 2 weeks ago

I think size_t is perfectly appropriate. In fact, I believe the opposite is true, since ptrdiff_t is signed.

https://en.cppreference.com/w/c/types/ptrdiff_t

If an array is so large (greater than PTRDIFF_MAX elements, but equal to or less than SIZE_MAX bytes), that the difference between two pointers may not be representable as ptrdiff_t, the result of subtracting two such pointers is undefined.

cmb69 commented 2 weeks ago

From the same document:

Only pointers to elements of the same array (including the pointer one past the end of the array) may be subtracted from each other.

iluuu1994 commented 2 weeks ago

Ah, good catch. So this would be UB in any case. I suppose we should be using uintptr_t then. But I'm skeptical this is the actual cause of the issue.

cmb69 commented 2 weeks ago

I suppose we should be using uintptr_t then.

That wouldn't help. Subtracting pointers not belonging to the same struct/array is UB. That does not mean that it doesn't work, though (especially when guarded, as in this case).

iluuu1994 commented 2 weeks ago

@cmb69 uintptr_t are not pointers though. They are integer types capable of holding a pointer.

nielsdos commented 2 weeks ago

Good catch! The UB should be fixed in all active branches, adding NOIPA is not a solution.

cmb69 commented 2 weeks ago

@iluuu1994, the type doesn't matter; the problem (if it is in practice) are the subtractions. See https://wiki.sei.cmu.edu/confluence/display/c/CC.+Undefined+Behavior#CC.UndefinedBehavior-ub_48.

@nielsdos, but how? I assume the assembler implementations of zend_string_equal_val() have been added because memcmp() performs "poorly" for some standard libraries/compilers. And slowing down the engine performance will not be well-received, I suppose. On the other hand, it seems these implementations are almost seven years old, so perhaps memcmp() implementations are more advanced nowadays.

nielsdos commented 2 weeks ago

@cmb69 This assembly path exists to have a string comparison that uses not a lot of instruction cache, not per se for bad standard C library performance but because the standard C library implementation for memcmp & friends are huge. In fact, with all advances memcmp cache bloat is probably even worse now than it was 7 y ago.

It's easily fixable: just move the subtraction to the assembly code.

nielsdos commented 2 weeks ago

If you want I can have a look at updating the assembly?

cmb69 commented 2 weeks ago

If you want I can have a look at updating the assembly?

Yes, please!

nielsdos commented 2 weeks ago

This would do the job of moving the subtraction: https://gist.github.com/nielsdos/3f189f268de2bb03ecdc6a6c634fb03d Patch made on 8.1, only tested x86-64 though.

iluuu1994 commented 2 weeks ago

I stand corrected.

https://gcc.gnu.org/onlinedocs/gcc/Arrays-and-pointers-implementation.html

When casting from pointer to integer and back again, the resulting pointer must reference the same object as the original pointer, otherwise the behavior is undefined. That is, one may not use integer arithmetic to avoid the undefined behavior of pointer arithmetic as proscribed in C99 and C11 6.5.6/8.

We absolutely do violate this in various places. Edit: Actually, probably not. The ones I see should point to the same object, and/or are using void*.

That said, I don't this particular case is UB, since we're not converting it back to a pointer. Arithmetics are obviously allowed on integers, you're just not allowed to convert it back to a pointer it it doesn't point to the same object/array.

cmb69 commented 2 weeks ago

I don't know if it's UB, but I can imagine it is. For some obscure/legacy architectures, such address subtraction may not make sense at all (perhaps they move arrays in memory at their own leisure). I mean, for "common" architectures, converting negative signed to unsigned could be well defined behavior, but it is not even implementation-defined, but undefined.

So even @nielsdos' assembler improvement (thanks!) may not work there, but on the other hand the subtraction in C may well be supported for PHP's practical purposes. Thus I'm not sure if we should apply that to a security branch.

Anyhow, I wondered why this has apparently not been reported by UBSan, and found https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63303 (where someone claims that would not be "valid C"). Doesn't answer that question, though.

@ryandesign, could you please check whether https://gist.github.com/nielsdos/3f189f268de2bb03ecdc6a6c634fb03d would solve the issue (as alternative to the noinline NOIPA backport)?

iluuu1994 commented 2 weeks ago

Note: I'm engaging further in this discussion not to argue, but in the interest of learning more. Some C edge cases are gray and complicated, so I may very well be in the wrong.

For some obscure/legacy architectures

These zend_string_equal_val() implementations are specific to x86. Other platforms fall back to memcmp().

such address subtraction may not make sense at all

Maybe there's some confusion about what I was suggesting:

uintptr_t delta = (uintptr_t)s2 - (uintptr_t)s1;

Technically, uintptr_t is optional, but we do already depend on it in many places. If it is defined, it is guaranteed to be big enough to hold the content of a pointer. The subtraction is now performed on unsigned integers, which is not UB. Order is irrelevant, as unsigned integers have wrap-around behavior.

Converting uintptr_t back to a pointer is implementation-defined. For GCC, it is allowed (as noted above), as long as the resulting pointer points to the same object as the original pointer (and it's alignment and type are valid). I'm assuming this has a similar purpose as strict aliasing rules. This would be UB:

int main(void) {
    int x = 0;
    int y = 1;
    uintptr_t delta = (uintptr_t)&y - (uintptr_t)&x;
    int *yp = (int*)((uintptr_t)&x + delta); // <-- UB here
}

Sadly, I could not find a reference to this for either Clang nor MSVC. In any case, this shouldn't matter since we're not doing that here.

So even @nielsdos' assembler improvement (thanks!) may not work there, but on the other hand the subtraction in C may well be supported for PHP's practical purposes. Thus I'm not sure if we should apply that to a security branch.

FWIU x86 always wraps (even for addressing), and assembly is not subject to these pointer restrictions, so this shouldn't be a problem either.

At least that's my understanding. Niels' implementation looks good too.

ryandesign commented 2 weeks ago

an unnamed (inline?) function called from zend_hash_find

I had never seen a function identified like .LL31 before and thought it was a label automatically generated by the compiler. I see now that the beginning of these .LL labels are explicitly given in the php assembly source code while it looks like the last digit is assigned by the compiler/assembler:

https://github.com/php/php-src/blob/d5035a7042aca24c5a82be31a5bdc7e88aada116/Zend/zend_string.c#L426

I tried --enable-debug initially but when I couldn't reproduce the problem I noticed it was setting -O0 so I removed it, thinking that might be hiding the problem. I used -Os because that is what MacPorts uses.

With some more builds of unmodified 8.1.30 I can now confirm that the problem only shows up when using -Os,-O2, or -O3. There is no crash when using -O0 or -O1. --enable-debug hides the problem because it appears to replace my -O flag with -O0.

I can't reproduce the issue since I don't have a Mac.

So far I have no evidence that the problem is Mac-specific. It could be specific to the version of clang Apple ships in Xcode 16. According to Wikipedia, Xcode 16 has Apple clang 16.0.0 which is based on llvm.org clang 17.0.6. Apple clang is open source and could be compiled for other platforms. The problem may also be in llvm.org clang which is available for other platforms.

It would be interesting to know how the assembly changes between the two versions.

I am not knowledgeable about x86_64 assembly so forgive my minimal analysis.

Using objdump -Sd I produced listings of the good and bad 8.1.30 executables both compiled with -Os and pulled out the code for zend_string_equal_val.

Good zend_string_equal_val in 8.1.30 plus #16168 (comment) with -Os ```asm 000000010021a3bd <_zend_string_equal_val>: 10021a3bd: 55 pushq %rbp 10021a3be: 48 89 e5 movq %rsp, %rbp 10021a3c1: 48 8d 57 18 leaq 0x18(%rdi), %rdx 10021a3c5: 48 29 fe subq %rdi, %rsi 10021a3c8: 48 8b 4f 10 movq 0x10(%rdi), %rcx 000000010021a3cc <.LL00>: 10021a3cc: 48 8b 04 32 movq (%rdx,%rsi), %rax 10021a3d0: 48 33 02 xorq (%rdx), %rax 10021a3d3: 0f 85 16 00 00 00 jne 0x10021a3ef <.LL10> 10021a3d9: 48 83 c2 08 addq $0x8, %rdx 10021a3dd: 48 83 e9 08 subq $0x8, %rcx 10021a3e1: 77 e9 ja 0x10021a3cc <.LL00> 10021a3e3: 48 c7 c0 01 00 00 00 movq $0x1, %rax 10021a3ea: e9 27 00 00 00 jmp 0x10021a416 <.LL30> 000000010021a3ef <.LL10>: 10021a3ef: 48 83 f9 08 cmpq $0x8, %rcx 10021a3f3: 0f 82 08 00 00 00 jb 0x10021a401 <.LL20> 10021a3f9: 48 31 c0 xorq %rax, %rax 10021a3fc: e9 15 00 00 00 jmp 0x10021a416 <.LL30> 000000010021a401 <.LL20>: 10021a401: 48 f7 d9 negq %rcx 10021a404: 48 8d 0c cd 40 00 00 00 leaq 0x40(,%rcx,8), %rcx 10021a40c: 48 d3 e0 shlq %cl, %rax 10021a40f: 0f 94 c0 sete %al 10021a412: 48 0f b6 c0 movzbq %al, %rax 000000010021a416 <.LL30>: 10021a416: 48 85 c0 testq %rax, %rax 10021a419: 0f 95 c0 setne %al 10021a41c: 5d popq %rbp 10021a41d: c3 retq ```
Bad zend_string_equal_val in unmodified 8.1.30 with -Os ```asm 000000010021a3b9 <_zend_string_equal_val>: 10021a3b9: 55 pushq %rbp 10021a3ba: 48 89 e5 movq %rsp, %rbp 10021a3bd: 48 8d 57 18 leaq 0x18(%rdi), %rdx 10021a3c1: 48 29 fe subq %rdi, %rsi 10021a3c4: 48 8b 4f 10 movq 0x10(%rdi), %rcx 000000010021a3c8 <.LL04>: 10021a3c8: 48 8b 04 32 movq (%rdx,%rsi), %rax 10021a3cc: 48 33 02 xorq (%rdx), %rax 10021a3cf: 0f 85 16 00 00 00 jne 0x10021a3eb <.LL14> 10021a3d5: 48 83 c2 08 addq $0x8, %rdx 10021a3d9: 48 83 e9 08 subq $0x8, %rcx 10021a3dd: 77 e9 ja 0x10021a3c8 <.LL04> 10021a3df: 48 c7 c0 01 00 00 00 movq $0x1, %rax 10021a3e6: e9 12 00 00 00 jmp 0x10021a3fd <.LL34> 000000010021a3eb <.LL14>: 10021a3eb: 48 83 f9 08 cmpq $0x8, %rcx 10021a3ef: 0f 82 6d f0 ff ff jb 0x100219462 <.LL24> 10021a3f5: 48 31 c0 xorq %rax, %rax 10021a3f8: e9 00 00 00 00 jmp 0x10021a3fd <.LL34> 000000010021a3fd <.LL34>: 10021a3fd: 48 85 c0 testq %rax, %rax 10021a400: 0f 95 c0 setne %al 10021a403: 5d popq %rbp 10021a404: c3 retq ```

The label numbers change between the good and bad versions, but the bad version is missing the next-to-the-last section which should have come from this code:

https://github.com/php/php-src/blob/d5035a7042aca24c5a82be31a5bdc7e88aada116/Zend/zend_string.c#L420-L425

In the good version it's labeled .LL20 in the disassembly. The section before that, labeled .LL10 in the good version and .LL14 in the bad version, has a jb jump in both versions:

https://github.com/php/php-src/blob/d5035a7042aca24c5a82be31a5bdc7e88aada116/Zend/zend_string.c#L417

In the good version, this jumps to .LL20. In the bad version, it jumps to .LL24 which is not in this function. Although the code at .LL24 appears to be the right code, it appears to have been moved into the middle of zend_new_interned_string_request:

zend_new_interned_string_request in unmodified 8.1.30 with -Os ```asm 00000001002193a2 <_zend_new_interned_string_request>: 1002193a2: 55 pushq %rbp 1002193a3: 48 89 e5 movq %rsp, %rbp 1002193a6: 41 57 pushq %r15 1002193a8: 41 56 pushq %r14 1002193aa: 41 55 pushq %r13 1002193ac: 41 54 pushq %r12 1002193ae: 53 pushq %rbx 1002193af: 48 83 ec 18 subq $0x18, %rsp 1002193b3: 48 89 fb movq %rdi, %rbx 1002193b6: 44 8b 6f 04 movl 0x4(%rdi), %r13d 1002193ba: 41 f6 c5 40 testb $0x40, %r13b 1002193be: 0f 85 ec 01 00 00 jne 0x1002195b0 <.LL31+0x9e> 1002193c4: 4c 8b 73 08 movq 0x8(%rbx), %r14 1002193c8: 4d 85 f6 testq %r14, %r14 1002193cb: 75 14 jne 0x1002193e1 <_zend_new_interned_string_request+0x3f> 1002193cd: 48 8d 7b 18 leaq 0x18(%rbx), %rdi 1002193d1: 48 8b 73 10 movq 0x10(%rbx), %rsi 1002193d5: e8 ca fc ff ff callq 0x1002190a4 <_zend_hash_func> 1002193da: 49 89 c6 movq %rax, %r14 1002193dd: 48 89 43 08 movq %rax, 0x8(%rbx) 1002193e1: 8b 05 cd b4 1c 00 movl 0x1cb4cd(%rip), %eax ## 0x1003e48b4 <_interned_strings_permanent+0xc> 1002193e7: 44 09 f0 orl %r14d, %eax 1002193ea: 48 98 cltq 1002193ec: 48 8b 15 c5 b4 1c 00 movq 0x1cb4c5(%rip), %rdx ## 0x1003e48b8 <_interned_strings_permanent+0x10> 1002193f3: 8b 04 82 movl (%rdx,%rax,4), %eax 1002193f6: 83 f8 ff cmpl $-0x1, %eax 1002193f9: 0f 84 8a 00 00 00 je 0x100219489 <.LL30+0x12> 1002193ff: 89 c6 movl %eax, %esi 100219401: 48 c1 e6 05 shlq $0x5, %rsi 100219405: 4c 39 74 32 10 cmpq %r14, 0x10(%rdx,%rsi) 10021940a: 0f 85 70 00 00 00 jne 0x100219480 <.LL30+0x9> 100219410: 4c 8b 7c 32 18 movq 0x18(%rdx,%rsi), %r15 100219415: 49 8b 4f 10 movq 0x10(%r15), %rcx 100219419: 48 3b 4b 10 cmpq 0x10(%rbx), %rcx 10021941d: 0f 85 5d 00 00 00 jne 0x100219480 <.LL30+0x9> 100219423: 49 8d 7f 18 leaq 0x18(%r15), %rdi 100219427: 49 89 d8 movq %rbx, %r8 10021942a: 4d 29 f8 subq %r15, %r8 000000010021942d <.LL00>: 10021942d: 4a 8b 04 07 movq (%rdi,%r8), %rax 100219431: 48 33 07 xorq (%rdi), %rax 100219434: 0f 85 16 00 00 00 jne 0x100219450 <.LL10> 10021943a: 48 83 c7 08 addq $0x8, %rdi 10021943e: 48 83 e9 08 subq $0x8, %rcx 100219442: 77 e9 ja 0x10021942d <.LL00> 100219444: 48 c7 c0 01 00 00 00 movq $0x1, %rax 10021944b: e9 27 00 00 00 jmp 0x100219477 <.LL30> 0000000100219450 <.LL10>: 100219450: 48 83 f9 08 cmpq $0x8, %rcx 100219454: 0f 82 08 00 00 00 jb 0x100219462 <.LL24> 10021945a: 48 31 c0 xorq %rax, %rax 10021945d: e9 15 00 00 00 jmp 0x100219477 <.LL30> 0000000100219462 <.LL24>: 100219462: 48 f7 d9 negq %rcx 100219465: 48 8d 0c cd 40 00 00 00 leaq 0x40(,%rcx,8), %rcx 10021946d: 48 d3 e0 shlq %cl, %rax 100219470: 0f 94 c0 sete %al 100219473: 48 0f b6 c0 movzbq %al, %rax 0000000100219477 <.LL30>: 100219477: 48 85 c0 testq %rax, %rax 10021947a: 0f 85 45 01 00 00 jne 0x1002195c5 <.LL31+0xb3> 100219480: 8b 44 32 0c movl 0xc(%rdx,%rsi), %eax 100219484: e9 6d ff ff ff jmp 0x1002193f6 <_zend_new_interned_string_request+0x54> 100219489: 48 8d 05 18 33 1b 00 leaq 0x1b3318(%rip), %rax ## 0x1003cc7a8 <_compiler_globals> 100219490: 8b 88 5c 01 00 00 movl 0x15c(%rax), %ecx 100219496: 44 09 f1 orl %r14d, %ecx 100219499: 48 63 c9 movslq %ecx, %rcx 10021949c: 48 8b 90 60 01 00 00 movq 0x160(%rax), %rdx 1002194a3: 8b 04 8a movl (%rdx,%rcx,4), %eax 1002194a6: 83 f8 ff cmpl $-0x1, %eax 1002194a9: 0f 84 75 00 00 00 je 0x100219524 <.LL31+0x12> 1002194af: 89 c6 movl %eax, %esi 1002194b1: 48 c1 e6 05 shlq $0x5, %rsi 1002194b5: 4c 39 74 32 10 cmpq %r14, 0x10(%rdx,%rsi) 1002194ba: 0f 85 5b 00 00 00 jne 0x10021951b <.LL31+0x9> 1002194c0: 4c 8b 7c 32 18 movq 0x18(%rdx,%rsi), %r15 1002194c5: 49 8b 4f 10 movq 0x10(%r15), %rcx 1002194c9: 48 3b 4b 10 cmpq 0x10(%rbx), %rcx 1002194cd: 0f 85 48 00 00 00 jne 0x10021951b <.LL31+0x9> 1002194d3: 49 8d 7f 18 leaq 0x18(%r15), %rdi 1002194d7: 49 89 d8 movq %rbx, %r8 1002194da: 4d 29 f8 subq %r15, %r8 00000001002194dd <.LL01>: 1002194dd: 4a 8b 04 07 movq (%rdi,%r8), %rax 1002194e1: 48 33 07 xorq (%rdi), %rax 1002194e4: 0f 85 16 00 00 00 jne 0x100219500 <.LL11> 1002194ea: 48 83 c7 08 addq $0x8, %rdi 1002194ee: 48 83 e9 08 subq $0x8, %rcx 1002194f2: 77 e9 ja 0x1002194dd <.LL01> 1002194f4: 48 c7 c0 01 00 00 00 movq $0x1, %rax 1002194fb: e9 12 00 00 00 jmp 0x100219512 <.LL31> 0000000100219500 <.LL11>: 100219500: 48 83 f9 08 cmpq $0x8, %rcx 100219504: 0f 82 58 ff ff ff jb 0x100219462 <.LL24> 10021950a: 48 31 c0 xorq %rax, %rax 10021950d: e9 00 00 00 00 jmp 0x100219512 <.LL31> 0000000100219512 <.LL31>: 100219512: 48 85 c0 testq %rax, %rax 100219515: 0f 85 aa 00 00 00 jne 0x1002195c5 <.LL31+0xb3> 10021951b: 8b 44 32 0c movl 0xc(%rdx,%rsi), %eax 10021951f: e9 82 ff ff ff jmp 0x1002194a6 <.LL30+0x2f> 100219524: 8b 03 movl (%rbx), %eax 100219526: 83 f8 02 cmpl $0x2, %eax 100219529: 72 55 jb 0x100219580 <.LL31+0x6e> 10021952b: ff c8 decl %eax 10021952d: 89 03 movl %eax, (%rbx) 10021952f: 4c 8b 7b 10 movq 0x10(%rbx), %r15 100219533: 48 83 c3 18 addq $0x18, %rbx 100219537: 4c 89 ff movq %r15, %rdi 10021953a: 48 83 e7 f8 andq $-0x8, %rdi 10021953e: 48 83 c7 20 addq $0x20, %rdi 100219542: e8 3d 16 f6 ff callq 0x10017ab84 <__emalloc> 100219547: 49 89 c4 movq %rax, %r12 10021954a: 48 b8 01 00 00 00 16 00 00 00 movabsq $0x1600000001, %rax ## imm = 0x1600000001 100219554: 49 89 04 24 movq %rax, (%r12) 100219558: 4d 89 7c 24 10 movq %r15, 0x10(%r12) 10021955d: 49 8d 7c 24 18 leaq 0x18(%r12), %rdi 100219562: 48 89 de movq %rbx, %rsi 100219565: 4c 89 fa movq %r15, %rdx 100219568: e8 ff 5a 06 00 callq 0x10027f06c <_zlibVersion+0x10027f06c> 10021956d: 43 c6 44 3c 18 00 movb $0x0, 0x18(%r12,%r15) 100219573: 4d 89 74 24 08 movq %r14, 0x8(%r12) 100219578: 45 8b 6c 24 04 movl 0x4(%r12), %r13d 10021957d: 4c 89 e3 movq %r12, %rbx 100219580: c7 03 01 00 00 00 movl $0x1, (%rbx) 100219586: 41 83 cd 40 orl $0x40, %r13d 10021958a: 44 89 6b 04 movl %r13d, 0x4(%rbx) 10021958e: 48 8d 55 c8 leaq -0x38(%rbp), %rdx 100219592: 48 89 1a movq %rbx, (%rdx) 100219595: c7 42 08 06 00 00 00 movl $0x6, 0x8(%rdx) 10021959c: bf 50 01 00 00 movl $0x150, %edi ## imm = 0x150 1002195a1: 48 03 3d a0 4c 12 00 addq 0x124ca0(%rip), %rdi ## 0x10033e248 <_zlibVersion+0x10033e248> 1002195a8: 48 89 de movq %rbx, %rsi 1002195ab: e8 0b 3f f9 ff callq 0x1001ad4bb <_zend_hash_add_new> 1002195b0: 49 89 df movq %rbx, %r15 1002195b3: 4c 89 f8 movq %r15, %rax 1002195b6: 48 83 c4 18 addq $0x18, %rsp 1002195ba: 5b popq %rbx 1002195bb: 41 5c popq %r12 1002195bd: 41 5d popq %r13 1002195bf: 41 5e popq %r14 1002195c1: 41 5f popq %r15 1002195c3: 5d popq %rbp 1002195c4: c3 retq 1002195c5: ff 0b decl (%rbx) 1002195c7: 75 ea jne 0x1002195b3 <.LL31+0xa1> 1002195c9: 48 89 df movq %rbx, %rdi 1002195cc: 45 84 ed testb %r13b, %r13b 1002195cf: 78 07 js 0x1002195d8 <.LL31+0xc6> 1002195d1: e8 57 16 f6 ff callq 0x10017ac2d <__efree> 1002195d6: eb db jmp 0x1002195b3 <.LL31+0xa1> 1002195d8: e8 85 58 06 00 callq 0x10027ee62 <_zlibVersion+0x10027ee62> 1002195dd: eb d4 jmp 0x1002195b3 <.LL31+0xa1> ```

zend_new_interned_string_request is also where we find the .LL31 label where we eventually crash.

It doesn't seem to me like the compiler should have moved this section into another function but I have often heard it said that compilers are allowed to do whatever they want if the code relies on undefined behavior.

@ryandesign, could you please check whether https://gist.github.com/nielsdos/3f189f268de2bb03ecdc6a6c634fb03d would solve the issue (as alternative to the noinline NOIPA backport)?

This works. The new disassembly is quite a bit longer than either the previous good or bad version. Maybe the extra code is saving and restoring something.

Good zend_string_equal_val in 8.1.30 plus https://gist.github.com/nielsdos/3f189f268de2bb03ecdc6a6c634fb03d with -Os ```asm 00000001004d5490 <_zend_string_equal_val>: 1004d5490: 55 pushq %rbp 1004d5491: 48 89 e5 movq %rsp, %rbp 1004d5494: 48 89 7d f8 movq %rdi, -0x8(%rbp) 1004d5498: 48 89 75 f0 movq %rsi, -0x10(%rbp) 1004d549c: 48 8b 45 f8 movq -0x8(%rbp), %rax 1004d54a0: 48 83 c0 18 addq $0x18, %rax 1004d54a4: 48 89 45 e8 movq %rax, -0x18(%rbp) 1004d54a8: 48 8b 45 f0 movq -0x10(%rbp), %rax 1004d54ac: 48 89 45 e0 movq %rax, -0x20(%rbp) 1004d54b0: 48 8b 45 f8 movq -0x8(%rbp), %rax 1004d54b4: 48 8b 40 10 movq 0x10(%rax), %rax 1004d54b8: 48 89 45 d8 movq %rax, -0x28(%rbp) 1004d54bc: 48 8b 4d d8 movq -0x28(%rbp), %rcx 1004d54c0: 48 8b 55 e8 movq -0x18(%rbp), %rdx 1004d54c4: 48 8b 75 e0 movq -0x20(%rbp), %rsi 1004d54c8: 48 8b 7d f8 movq -0x8(%rbp), %rdi 1004d54cc: 48 29 fe subq %rdi, %rsi 00000001004d54cf <.LL00>: 1004d54cf: 48 8b 04 32 movq (%rdx,%rsi), %rax 1004d54d3: 48 33 02 xorq (%rdx), %rax 1004d54d6: 0f 85 1a 00 00 00 jne 0x1004d54f6 <.LL10> 1004d54dc: 48 83 c2 08 addq $0x8, %rdx 1004d54e0: 48 83 e9 08 subq $0x8, %rcx 1004d54e4: 0f 87 e5 ff ff ff ja 0x1004d54cf <.LL00> 1004d54ea: 48 c7 c0 01 00 00 00 movq $0x1, %rax 1004d54f1: e9 27 00 00 00 jmp 0x1004d551d <.LL30> 00000001004d54f6 <.LL10>: 1004d54f6: 48 83 f9 08 cmpq $0x8, %rcx 1004d54fa: 0f 82 08 00 00 00 jb 0x1004d5508 <.LL20> 1004d5500: 48 31 c0 xorq %rax, %rax 1004d5503: e9 15 00 00 00 jmp 0x1004d551d <.LL30> 00000001004d5508 <.LL20>: 1004d5508: 48 f7 d9 negq %rcx 1004d550b: 48 8d 0c cd 40 00 00 00 leaq 0x40(,%rcx,8), %rcx 1004d5513: 48 d3 e0 shlq %cl, %rax 1004d5516: 0f 94 c0 sete %al 1004d5519: 48 0f b6 c0 movzbq %al, %rax 00000001004d551d <.LL30>: 1004d551d: 48 89 75 c0 movq %rsi, -0x40(%rbp) 1004d5521: 48 89 55 c8 movq %rdx, -0x38(%rbp) 1004d5525: 48 89 c6 movq %rax, %rsi 1004d5528: 48 8b 45 c0 movq -0x40(%rbp), %rax 1004d552c: 48 89 ca movq %rcx, %rdx 1004d552f: 48 8b 4d c8 movq -0x38(%rbp), %rcx 1004d5533: 48 89 75 d0 movq %rsi, -0x30(%rbp) 1004d5537: 48 89 55 d8 movq %rdx, -0x28(%rbp) 1004d553b: 48 89 4d e8 movq %rcx, -0x18(%rbp) 1004d553f: 48 89 45 e0 movq %rax, -0x20(%rbp) 1004d5543: 48 83 7d d0 00 cmpq $0x0, -0x30(%rbp) 1004d5548: 0f 95 c0 setne %al 1004d554b: 24 01 andb $0x1, %al 1004d554d: 0f b6 c0 movzbl %al, %eax 1004d5550: 5d popq %rbp 1004d5551: c3 retq 1004d5552: 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 nopw %cs:(%rax,%rax) ```
ryandesign commented 2 weeks ago

@ryandesign, could you please check whether https://gist.github.com/nielsdos/3f189f268de2bb03ecdc6a6c634fb03d would solve the issue (as alternative to the noinline NOIPA backport)?

This works. The new disassembly is quite a bit longer than either the previous good or bad version. Maybe the extra code is saving and restoring something.

Correction. This does not work. This part of my test was erroneous because I still had --enable-debug in my build script so -O0 was being used.

Re-testing without --enable-debug, the result is the same crash as unmodified 8.1.30.

Bad zend_string_equal_val in 8.1.30 plus https://gist.github.com/nielsdos/3f189f268de2bb03ecdc6a6c634fb03d with -Os ```asm 000000010021a3b9 <_zend_string_equal_val>: 10021a3b9: 55 pushq %rbp 10021a3ba: 48 89 e5 movq %rsp, %rbp 10021a3bd: 48 8d 57 18 leaq 0x18(%rdi), %rdx 10021a3c1: 48 8b 4f 10 movq 0x10(%rdi), %rcx 10021a3c5: 48 29 fe subq %rdi, %rsi 000000010021a3c8 <.LL04>: 10021a3c8: 48 8b 04 32 movq (%rdx,%rsi), %rax 10021a3cc: 48 33 02 xorq (%rdx), %rax 10021a3cf: 0f 85 16 00 00 00 jne 0x10021a3eb <.LL14> 10021a3d5: 48 83 c2 08 addq $0x8, %rdx 10021a3d9: 48 83 e9 08 subq $0x8, %rcx 10021a3dd: 77 e9 ja 0x10021a3c8 <.LL04> 10021a3df: 48 c7 c0 01 00 00 00 movq $0x1, %rax 10021a3e6: e9 12 00 00 00 jmp 0x10021a3fd <.LL34> 000000010021a3eb <.LL14>: 10021a3eb: 48 83 f9 08 cmpq $0x8, %rcx 10021a3ef: 0f 82 6d f0 ff ff jb 0x100219462 <.LL24> 10021a3f5: 48 31 c0 xorq %rax, %rax 10021a3f8: e9 00 00 00 00 jmp 0x10021a3fd <.LL34> 000000010021a3fd <.LL34>: 10021a3fd: 48 85 c0 testq %rax, %rax 10021a400: 0f 95 c0 setne %al 10021a403: 5d popq %rbp 10021a404: c3 retq ```
nielsdos commented 2 weeks ago

I wonder if the inliner gets confused by the label names, may be worth trying to use 1b/1f type of labels instead. I have to go work now but I can try give a patch this evening.

iluuu1994 commented 2 weeks ago

Xcode 16 has Apple clang 16.0.0 which is based on llvm.org clang 17.0.6. Apple clang is open source and could be compiled for other platforms. The problem may also be in llvm.org clang which is available for other platforms.

I have just tried a release build with Clang 17.0.6 on Linux which doesn't reproduce the issue. I don't know how to compile (or even where to find) Apple Clang.

ryandesign commented 2 weeks ago

I don't know how to compile (or even where to find) Apple Clang.

You know, it may have been wishful thinking on my part. I assumed it would be at https://github.com/apple or https://github.com/apple-oss-distributions or https://opensource.apple.com but maybe they keep their llvm/clang changes private.

nielsdos commented 2 weeks ago

This is a modified version of my previous patch that uses another type of label style, I wonder if this works for you: https://gist.github.com/nielsdos/2805ca8ddfa27655f837b5660150312e

ryandesign commented 1 week ago

Although the code at .LL24 appears to be the right code, it appears to have been moved into the middle of zend_new_interned_string_request:

zend_new_interned_string_request is also where we find the .LL31 label where we eventually crash.

To document my further understanding of this, zend_string_equal_val is the only place that the assembly code with the labels .LL0%= through .LL3%= exists. zend_new_interned_string_request doesn't contain any assembly code, but it apparently calls zend_string_equal_val (or something that calls it) twice and so in the "bad" version from above, that assembly code gets inlined into zend_new_interned_string_request twice. Additional copies end up in zend_new_interned_string_permanent and zend_interned_string_find_permanent. %= is supposed to get replaced with a unique number each time which happens for .LL0%=, .LL1%=, and .LL3%=, but for some reason all five instances of .LL2%= get optimized down to a single .LL24 instance. In the "good" version from above, zend_string_equal_val is not inlined so there is only one copy of each label in the executable.

I have read that there is no documentation for the llvm assembler and that one should consult the documentation for the GNU assembler of which the llvm assembler implements a subset. A Stack Overflow post pointed me to this part of the GNU assembler docs on symbol names which says:

By default, the local label prefix is .L for ELF systems or L for traditional a.out systems, but each target may have its own set of local label prefixes. […] Local symbols are defined and used within the assembler, but they are normally not saved in object files. Thus, they are not visible when debugging.

Searching the llvm code, they do set PrivateGlobalPrefix and PrivateLabelPrefix (I'm not clear on the distinction) to .L in MCAsmInfoELF.cpp which is presumably for ELF systems like Linux. I don't see a file specific to Mach-O which would be used on macOS, but I see it uses L in MCAsmInfo.cpp; maybe that is the fallback for any systems that don't have specific settings overrides.

Based on this, I tried simply changing all label name prefixes from .L to L:

diff --git a/Zend/zend_string.c b/Zend/zend_string.c
index 1da3ce5248..619055e9dc 100644
--- a/Zend/zend_string.c
+++ b/Zend/zend_string.c
@@ -403,27 +403,27 @@ ZEND_API bool ZEND_FASTCALL zend_string_equal_val(zend_string *s1, zend_string *
    zend_ulong ret;

    __asm__ (
-       ".LL0%=:\n\t"
+       "LL0%=:\n\t"
        "movl (%2,%3), %0\n\t"
        "xorl (%2), %0\n\t"
-       "jne .LL1%=\n\t"
+       "jne LL1%=\n\t"
        "addl $0x4, %2\n\t"
        "subl $0x4, %1\n\t"
-       "ja .LL0%=\n\t"
+       "ja LL0%=\n\t"
        "movl $0x1, %0\n\t"
-       "jmp .LL3%=\n\t"
-       ".LL1%=:\n\t"
+       "jmp LL3%=\n\t"
+       "LL1%=:\n\t"
        "cmpl $0x4,%1\n\t"
-       "jb .LL2%=\n\t"
+       "jb LL2%=\n\t"
        "xorl %0, %0\n\t"
-       "jmp .LL3%=\n\t"
-       ".LL2%=:\n\t"
+       "jmp LL3%=\n\t"
+       "LL2%=:\n\t"
        "negl %1\n\t"
        "lea 0x20(,%1,8), %1\n\t"
        "shll %b1, %0\n\t"
        "sete %b0\n\t"
        "movzbl %b0, %0\n\t"
-       ".LL3%=:\n"
+       "LL3%=:\n"
        : "=&a"(ret),
          "+c"(len),
          "+r"(ptr)
@@ -441,27 +441,27 @@ ZEND_API bool ZEND_FASTCALL zend_string_equal_val(zend_string *s1, zend_string *
    zend_ulong ret;

    __asm__ (
-       ".LL0%=:\n\t"
+       "LL0%=:\n\t"
        "movq (%2,%3), %0\n\t"
        "xorq (%2), %0\n\t"
-       "jne .LL1%=\n\t"
+       "jne LL1%=\n\t"
        "addq $0x8, %2\n\t"
        "subq $0x8, %1\n\t"
-       "ja .LL0%=\n\t"
+       "ja LL0%=\n\t"
        "movq $0x1, %0\n\t"
-       "jmp .LL3%=\n\t"
-       ".LL1%=:\n\t"
+       "jmp LL3%=\n\t"
+       "LL1%=:\n\t"
        "cmpq $0x8,%1\n\t"
-       "jb .LL2%=\n\t"
+       "jb LL2%=\n\t"
        "xorq %0, %0\n\t"
-       "jmp .LL3%=\n\t"
-       ".LL2%=:\n\t"
+       "jmp LL3%=\n\t"
+       "LL2%=:\n\t"
        "negq %1\n\t"
        "lea 0x40(,%1,8), %1\n\t"
        "shlq %b1, %0\n\t"
        "sete %b0\n\t"
        "movzbq %b0, %0\n\t"
-       ".LL3%=:\n"
+       "LL3%=:\n"
        : "=&a"(ret),
          "+c"(len),
          "+r"(ptr)

This works on macOS 15. No more crash. But this would not be a portable fix since it could just cause the same problem to occur on ELF systems.

may be worth trying to use 1b/1f type of labels instead.

I see that the GNU assembler docs describe these as local labels, distinct from local symbols. It says:

Local label names […] are immediately transformed into more conventional symbol names before the assembler uses them. […] All local symbols begin with the system-specific local label prefix.

This does sound like the portable fix for the problem since these local labels will get converted to symbols with the .L prefix on ELF systems and the L prefix on macOS and other systems.

This is a modified version of my previous patch that uses another type of label style, I wonder if this works for you: https://gist.github.com/nielsdos/2805ca8ddfa27655f837b5660150312e

Thanks. This also works. Looking at the disassembly of both my nonportable fix and your portable one, they are identical (except for the swapping of the order of the subq and movq instructions as already shown previously, due to your changes beyond the label names), and neither contains any evidence of the local symbols, consistent with the documentation's statement that they are not saved in object files.

So I think that the problem all along was using the .L prefix with the expectation that this made it local when that is not true on all systems, and just changing to the 1b/1f notation is enough to fix the problem. I think you don't need to use numbers as high as 10-13; numbers 0-3 work for me too.

ryandesign commented 1 week ago

just changing to the 1b/1f notation is enough to fix the problem. I think you don't need to use numbers as high as 10-13; numbers 0-3 work for me too.

So the fix I'm recommending is:

diff --git a/Zend/zend_string.c b/Zend/zend_string.c
index 1da3ce5248..01b346a280 100644
--- a/Zend/zend_string.c
+++ b/Zend/zend_string.c
@@ -403,27 +403,27 @@ ZEND_API bool ZEND_FASTCALL zend_string_equal_val(zend_string *s1, zend_string *
    zend_ulong ret;

    __asm__ (
-       ".LL0%=:\n\t"
+       "0:\n\t"
        "movl (%2,%3), %0\n\t"
        "xorl (%2), %0\n\t"
-       "jne .LL1%=\n\t"
+       "jne 1f\n\t"
        "addl $0x4, %2\n\t"
        "subl $0x4, %1\n\t"
-       "ja .LL0%=\n\t"
+       "ja 0b\n\t"
        "movl $0x1, %0\n\t"
-       "jmp .LL3%=\n\t"
-       ".LL1%=:\n\t"
+       "jmp 3f\n\t"
+       "1:\n\t"
        "cmpl $0x4,%1\n\t"
-       "jb .LL2%=\n\t"
+       "jb 2f\n\t"
        "xorl %0, %0\n\t"
-       "jmp .LL3%=\n\t"
-       ".LL2%=:\n\t"
+       "jmp 3f\n\t"
+       "2:\n\t"
        "negl %1\n\t"
        "lea 0x20(,%1,8), %1\n\t"
        "shll %b1, %0\n\t"
        "sete %b0\n\t"
        "movzbl %b0, %0\n\t"
-       ".LL3%=:\n"
+       "3:\n"
        : "=&a"(ret),
          "+c"(len),
          "+r"(ptr)
@@ -441,27 +441,27 @@ ZEND_API bool ZEND_FASTCALL zend_string_equal_val(zend_string *s1, zend_string *
    zend_ulong ret;

    __asm__ (
-       ".LL0%=:\n\t"
+       "0:\n\t"
        "movq (%2,%3), %0\n\t"
        "xorq (%2), %0\n\t"
-       "jne .LL1%=\n\t"
+       "jne 1f\n\t"
        "addq $0x8, %2\n\t"
        "subq $0x8, %1\n\t"
-       "ja .LL0%=\n\t"
+       "ja 0b\n\t"
        "movq $0x1, %0\n\t"
-       "jmp .LL3%=\n\t"
-       ".LL1%=:\n\t"
+       "jmp 3f\n\t"
+       "1:\n\t"
        "cmpq $0x8,%1\n\t"
-       "jb .LL2%=\n\t"
+       "jb 2f\n\t"
        "xorq %0, %0\n\t"
-       "jmp .LL3%=\n\t"
-       ".LL2%=:\n\t"
+       "jmp 3f\n\t"
+       "2:\n\t"
        "negq %1\n\t"
        "lea 0x40(,%1,8), %1\n\t"
        "shlq %b1, %0\n\t"
        "sete %b0\n\t"
        "movzbq %b0, %0\n\t"
-       ".LL3%=:\n"
+       "3:\n"
        : "=&a"(ret),
          "+c"(len),
          "+r"(ptr)

This was for 8.1.30 but it applies cleanly on master.

nielsdos commented 1 week ago

@ryandesign thanks for the research, and glad to see my idea worked. I think the potential UB should be fixed too though, either via uintptr_t cast or via my assembly trick. Do you make a PR or should I?

nielsdos commented 1 week ago

Btw, I used 10-13 in my patch because I just did a lazy find+replace, no particular reason

ryandesign commented 1 week ago

Changing to the 1b/1f label style was your idea and you're better able to explain the other changes you made, so I'll let you do the PR.

I will want to backport the fix to EOL versions of PHP in MacPorts, as requested by my user, so that's why I suggested the simplest, most minimal fix, one which I could fully understand, in case I have to change the patch for older PHP versions.