Open uberbrady opened 2 years ago
Looking at the backtrace, it looks like the process segfault during curl extension shutdown calling an openssl cleanup. Could you please provide us the libcurl version, and also the openssl version you're using ?
Sure thing!
rpm -qa |grep openssl
openssl-1.0.2k-24.amzn2.0.2.x86_64
openssl-libs-1.0.2k-24.amzn2.0.2.x86_64
openssl-debuginfo-1.0.2k-24.amzn2.0.2.x86_64
and
python-pycurl-7.19.0-19.amzn2.0.2.x86_64
curl-debuginfo-7.79.1-1.amzn2.0.1.x86_64
libcurl-7.79.1-1.amzn2.0.1.x86_64
curl-7.79.1-1.amzn2.0.1.x86_64
And, just for good measure (in case this is some kind of openldap library thing) -
openldap-2.4.44-23.amzn2.0.3.x86_64
openldap-debuginfo-2.4.44-23.amzn2.0.3.x86_64
Ooooh! Interesting - if I take the s
off of the ldaps://
name, the segfault stops happening...And if I put in ldap_start_tls($connection);
then it starts happening again. So I think you're onto something!
This might not be a php-src issue at all, but rather a conflict between libcurl and openldap (libsasl?)
Anyhow, please focus on PHP 8.0, since PHP 7.4 is no longer actively supported, so wouldn't receive regular bug fixes anyway.
And a question: are you using a threaded SAPI?
If I add php_sapi_name()
I get cli
- I'm not sure if that's what you're asking though. This is a distribution-default Amazon Linux 2 build, (which is pretty much just CentOS) so I don't imagine there is too much funny going on there. I'll attach php -i
just in case that helps though. Here's a gist of the output: https://gist.github.com/uberbrady/4e43f38e45ff3762e617715ca0996894
As for my focus, I help to run a rather large open source project, so my users often trail on their PHP releases, often for reasons out of their own control. We're dragging them forward as fast as we can. We only just pulled our minimum PHP version up to 7.4 with our latest release :/
Thanks for the further info. I was looking for
Thread Safety => disabled
Since this is a non thread-safe build, there shouldn't be any multi-threading issues.
From looking at the stack backtrace, libcurl calls ENGINE_cleanup()
, and some other library may already have done this (or something related). But even if that would be something caused by PHP, I'm not sure it's worth spending time on that, since the OpenSSL initialization/cleanup is reworked as of OpenSSL 1.1 where this happens automatically behind the scenes. Can't you use some newer OpenSSL version?
Our open-source users can't in most cases, and in the cases where they could, they wouldn't know how to do that. In probably quite a few other cases, they might actually be able to, and know how, but not be allowed by their management.
Most of them are just going to use the distro's default version of PHP against their distro's default version of openssl.
libcurl does no longer call engine_cleanup on recentish versions given that you are not using arcane openssl versions. this is not a PHP bug..you just need to build libcurl against openssl 1.1 and this will go away. Bugs like this will continue to pop up though, as long as there is shared global state.
It's honestly something I talked about with my VP of tech - custom-building our own PHP and curl and various other extensions to avoid this. He's loathe to do it because it means every point release of PHP means yet another build. But if we need to, we can do that.
But what do I do for the rest of my open-source users? We guesstimate (it's impossible to know) that we have probably 10x or 100x the number of free, self-hosted, open-source users as we do hosted-by-us users. I don't think they'll have the technical acumen to be able to compile custom builds. As I mentioned before, we also have people who aren't permitted to custom-compile stuff by policy, as well. And in some cases, we have people running on shared infrastructure where they simply don't have access to be able to modify local installations of PHP.
Amazon Linux is basically just a slightly tweaked version of CentOS, so this is a pretty large install base. I totally get that openssl 1.0 seems to have been a bit of a nightmare (and a nightmare you folks are trying to forget, for good reason), but I still do believe that folks doing yum install php
should end up with a version of PHP that doesn't segfault.
Okay, I've checked our OpenSSL requirements: ≥ 1.0.1 for PHP-8.0, ≥ 1.0.2 for PHP-8.1 and master. As such, it is our liability to do something about this. The question is, though, whether we can do something about it (except for documenting this issue).
@bukka, thoughts?
I mean, I have an idea - but I suspect you're not gonna like it :)
Is there a way for PHP to not clean up SSL stuff as its tearing itself down? I mean, once the binary actually terminates, isn't everything effectively gone? If it weren't, it would mean everything you control-C a PHP executable it would be leaving stuff around, which I suspect doesn't happen.
Anyways, I don't know the codebase and I don't know how the internals work, but just thought I'd throw a janky idea out there...
Previously we had a similar issue with locking callback but that was freed by PHP so we could fix it. However this seems to be issue with freeing engines which PHP does not do as you can see in https://github.com/php/php-src/blob/PHP-8.0.16/ext/openssl/openssl.c#L1346-L1352 . I checked your version of LDAP https://github.com/openldap/openldap/blob/1c9416493bd219b08d839cd9e93fc64daa89b752/libraries/libldap/tls_o.c#L226-L235 and it also does not free engines either so it is interesting that it is crashing there. I might need to do a bit of digging later to see what is going on. Currently busy with some other things so might take me a little bit of time to get to it.
I had a proper look into this. I'm able to see the segfault on Amazon Linux 2 for all PHP version. I used remi repo for 8.1 and 8.2 as documented here: https://computingforgeeks.com/how-to-install-php-8-on-amazon-linux/ . It just requires installing
sudo yum install php82 php82-php-ldap
and then executing the script above with ldap_connect and ldap_bind.
Considering that Amazon Linux 2 is going to supported till 2024 and there are probably lots of PHP users on it, it is something worth looking.
When debugging I noticed that OpenSSL ext adds the rdrand engine so created PR #9767 with potential fix. I tried quite hard to recreate it on Amazon Linux 2 with my own compiled PHP even with shared curl and ldap but it wasn't crashing so cannot verify the fix. But I asked @remicollet to help with that so hopefully he manages to do that.
I just pushed to the potential fix to 8.2 only for now and will verify once the next RC released and Remi's package available. If all good, then I will backport it to lower branche(s).
So I just tested that OpenSSL change with the last RC and still see the segfault. After more debugging the added engine is not actually the issue.
The problem is different. What happens is that OpenLDAP sets locking callback as can be seen in this backtrace:
Breakpoint 2, CRYPTO_set_locking_callback (func=0x7fffe9e3c2b0 <tlso_locking_cb>) at cryptlib.c:407
407 {
(gdb) bt
#0 CRYPTO_set_locking_callback (func=0x7fffe9e3c2b0 <tlso_locking_cb>) at cryptlib.c:407
#1 0x00007fffe9e3c281 in tlso_thr_init () from /lib64/libldap_r-2.4.so.2
#2 0x00007fffe9e3a040 in tls_init () from /lib64/libldap_r-2.4.so.2
#3 0x00007fffe9e3b3fe in ldap_int_tls_start () from /lib64/libldap_r-2.4.so.2
#4 0x00007fffe9e164e7 in ldap_int_open_connection () from /lib64/libldap_r-2.4.so.2
#5 0x00007fffe9e2901b in ldap_new_connection () from /lib64/libldap_r-2.4.so.2
#6 0x00007fffe9e159fa in ldap_open_defconn () from /lib64/libldap_r-2.4.so.2
#7 0x00007fffe9e2a46f in ldap_send_initial_request () from /lib64/libldap_r-2.4.so.2
#8 0x00007fffe9e1b4f3 in ldap_extended_operation () from /lib64/libldap_r-2.4.so.2
#9 0x00007fffe9e1b9e9 in ldap_extended_operation_s () from /lib64/libldap_r-2.4.so.2
#10 0x00007fffe9e3b7d2 in ldap_start_tls_s () from /lib64/libldap_r-2.4.so.2
#11 0x00007ffff7f12fe5 in zif_ldap_start_tls () at /usr/src/debug/php-8.2.0RC7/ext/ldap/ldap.c:3591
#12 0x00005555558d466f in ZEND_DO_ICALL_SPEC_RETVAL_UNUSED_HANDLER () at /usr/src/debug/php-8.2.0RC7/Zend/zend_vm_execute.h:1250
#13 execute_ex () at /usr/src/debug/php-8.2.0RC7/Zend/zend_vm_execute.h:56012
#14 0x00005555558d9921 in zend_execute (op_array=0x7ffff3885100, return_value=0x0) at /usr/src/debug/php-8.2.0RC7/Zend/zend_vm_execute.h:60380
#15 0x0000555555869aa0 in zend_execute_scripts () at /usr/src/debug/php-8.2.0RC7/Zend/zend.c:1780
#16 0x0000555555803abe in php_execute_script () at /usr/src/debug/php-8.2.0RC7/main/main.c:2540
#17 0x000055555594e516 in do_cli (argc=2, argv=0x555555e33120) at /usr/src/debug/php-8.2.0RC7/sapi/cli/php_cli.c:964
#18 0x000055555563bb0c in main (argc=2, argv=0x555555e33120) at /usr/src/debug/php-8.2.0RC7/sapi/cli/php_cli.c:1333
Note that I use ldap_start_tls
which results in the same issue as sasl bind because it also calls tls_init
. The problem is that this callback is never uninitialized (set to NULL
l) by OpenLDAP. We actually do that in OpenSSL extension but in this packaging setup OpenSSL extension is linked statically and OpenLDAP and Curl are dynamically linked. It means that first is unloaded OpenLDAP (descending alphabetical order is used for unloading) which leaves the unitialized callback and then Curl is unloaded which crashes because the callback is no longer there. If there wasn't a crash, the OpenSSL extension would run shutdown hook after that so its clean up doesn't matter and cannot be fixed in OpenSSL extension.
I'm not really an expert on OpenLDAP but from looking into it I haven't seen a way how to even trigger library destroy that would call tlso_destroy
so not sure if some new function would need to be added for that to OpenLDAP that could be then called in ldap extension shutdown. In any case, the tlso_destroy
doesn't clean the callback so some changes would be required there anyway. Alternatively the extension could call CRYPTO_set_locking_callback(NULL);
directly but that's a bit messy as it assumes that OpenLDAP uses OpenSSL in some way.
I think a bug report should be opened with OpenLDAP. I will leave this issue open if OpenLDAP change is applied or if the maintainer of ldap extension ( @MCMic ) wants to consider a direct clean up of the callback.
@uberbrady I have been thinking about this a bit more and realised that you can also fix it by prioritising ldap extension in the load process. It means if it loads before curl, then it will unload after curl so the thing that you can do is to just rename PHP ldap ext ini file. For default distro (Amazon Linux 2 in this case) packages it would be:
mv /etc/php.d/20-ldap.ini /etc/php.d/10-ldap.ini
If you are going to use Remi repo, it would be (this is for PHP 8.2 but if you want PHP 8.1, just s/php82/php81/g
) :
mv /etc/opt/remi/php82/php.d/20-ldap.ini etc/opt/remi/php82/php.d/10-ldap.ini
I just tested it and there's no segfault if you do that.
It might be worth if @remicollet could change it in the distributed packages. Not sure however how to change it for the main distro package or where to ask for it. Btw this bug impacts all distributions using OpenSSL 1.0.2 and loading ldap after curl - there's nothing Amazon Linux 2 specific here.
I have been thinking about this a bit more and realised that you can also fix it by prioritising ldap extension in the load process.
It might make sense to mention that in our php.ini-(development|production) files. There is already a respective comment regarding exif and mbstring.
It might make sense to mention that in our php.ini-(development|production) files. There is already a respective comment regarding exif and mbstring.
Good point. How about #9995 ?
I just merged the INI changes so now it should be considered as a bug in the distribution as well if it doesn't follow the required order (it should be a bug if distribution loads curl before ldap and uses OpenSSL 1.0.2).
Problem is that libcurl
is build with libldap
while php ext are properly build with libldap_r
Another workaround is to define LD_PRELOAD=/usr/lib64/libldap_r-2.4.so.2
This doesn't not affect mod_php (used by default on EL-7), but only cli.
BTW, I can only reproduce with PHP 8.2 (not with older PHP version) on CentOS 7 and as CentOS 7 is >8 years old, close to its EOL, I'm used to recommend to use a modern distro for modern features
Segfault with PHP 8.2.0RC7 on CentOS 7:
(gdb) bt
#0 0x00007fffe7fa62f0 in ?? ()
#1 0x00007ffff568a8e8 in ENGINE_remove () from /lib64/libcrypto.so.10
#2 0x00007ffff568aa35 in engine_list_cleanup () from /lib64/libcrypto.so.10
#3 0x00007ffff5689f76 in engine_cleanup_cb_free () from /lib64/libcrypto.so.10
#4 0x00007ffff569a360 in sk_pop_free () from /lib64/libcrypto.so.10
#5 0x00007ffff568a2fc in ENGINE_cleanup () from /lib64/libcrypto.so.10
#6 0x00005555557c91b6 in zm_shutdown_openssl (type=1, module_number=4) at /usr/src/debug/php-8.2.0RC7/ext/openssl/openssl.c:1329
#7 0x00005555559c39bb in module_destructor (module=0x555555f54300 <openssl_module_entry>) at /usr/src/debug/php-8.2.0RC7/Zend/zend_API.c:3043
#8 0x00005555559cfc35 in _zend_hash_del_el_ex (prev=<optimized out>, p=0x555555f83560, idx=3, ht=0x555555f6db60 <module_registry>) at /usr/src/debug/php-8.2.0RC7/Zend/zend_hash.c:1408
#9 _zend_hash_del_el (p=0x555555f83560, idx=3, ht=0x555555f6db60 <module_registry>) at /usr/src/debug/php-8.2.0RC7/Zend/zend_hash.c:1435
#10 zend_hash_graceful_reverse_destroy (ht=0x555555f6db60 <module_registry>) at /usr/src/debug/php-8.2.0RC7/Zend/zend_hash.c:1960
#11 0x00005555559c215c in zend_destroy_modules () at /usr/src/debug/php-8.2.0RC7/Zend/zend_API.c:2369
#12 0x00005555559bd423 in zend_shutdown () at /usr/src/debug/php-8.2.0RC7/Zend/zend.c:1088
#13 0x00005555559578ba in php_module_shutdown () at /usr/src/debug/php-8.2.0RC7/main/main.c:2415
#14 php_module_shutdown () at /usr/src/debug/php-8.2.0RC7/main/main.c:2392
#15 0x000055555578fae5 in main (argc=2, argv=0x555555f74f60) at /usr/src/debug/php-8.2.0RC7/sapi/cli/php_cli.c:1346
So caused by 1ef65c1cf030ac5173fb388795f82e3d14a70c6b Using a test build with this reverted, no more segfault.
Notice: this is on CentOS 7 which includes fixes related to this issue, so, maybe still an issue on amzn2 which case based on EL-7, but which have diverged a lot now.
At least, for ex with apr
On CentOS 7 have apr 1.4.8-7 which includes fixes related to RTLD_DEEPBIND (reported to fix this issue, at least for mod_php), while Amzn2 have 1.7.0-9.amzn2...
Problem is that
libcurl
is build withlibldap
while php ext are properly build withlibldap_r
Huh,, That's ugly..indeed that's gonna be a problem ..
Problem is that
libcurl
is build withlibldap
while php ext are properly build withlibldap_r
Huh,, That's ugly..indeed that's gonna be a problem ..
On my system libldap_r is just a symlink to libldap.. isn't that the case of this problematic target ?
On my system libldap_r is just a symlink to libldap..
Yes on recent distro
isn't that the case of this problematic target ?
Not in this 8 years old distro.
That CentOS 7 segfault should be fixed by https://github.com/php/php-src/commit/3d90a24e9349ea17e5467de7b1d7bfa17ec2c650 . It is not exactly the same thing because the order seems to be different there so it's not triggered by curl but openssl ext. Might be due to using different lib as you noted but not sure. I see segfaults for all versions there.
I think Amazon Linux 2 should be still important for us to support as it's default for AWS EC2 and it's heavily used and people most likely want to run latest PHP there. At least it's default when you create EC2 instance in console so we might need to support it (meaning support OpenSSL 1.0.2) for some time - think they extended support to June 2024.
I confirm 8.2.0RC7 with 3d90a24e9349ea17e5467de7b1d7bfa17ec2c650 doesn't raise segfault anymore (build 22 in my testing repo) on CentOS 7
Trying on amzn2 using https://git.remirepo.net/cgit/tools/docker.git/tree/amzn2-php82.dockerfile
$ podman build -t foo -f amzn2-php82.dockerfile .
...
STEP 6/9: RUN php /tmp/ldap.php
object(LDAP\Connection)#1 (0) {
}
bool(false)
PHP Warning: ldap_bind(): Unable to bind to server: Protocol error in /tmp/ldap.php on line 4
container exited on segmentation fault
...
Yes that's expected as I didn't expect to fix the amzn2 issue because the problem is not in OpenSSL extension but in OpenLDAP as I mentioned above - it basically doesn't offer any way how to clean up the locking callback that it introduces... :disappointed:
For what it's worth, our amzn2 issue (running 8.0) was fixed with @bukka 's import re-ordering suggestion: https://github.com/php/php-src/issues/8620#issuecomment-1326263441
Thanks for that!
@uberbrady I have been thinking about this a bit more and realised that you can also fix it by prioritising ldap extension in the load process. It means if it loads before curl, then it will unload after curl so the thing that you can do is to just rename PHP ldap ext ini file. For default distro (Amazon Linux 2 in this case) packages it would be:
mv /etc/php.d/20-ldap.ini /etc/php.d/10-ldap.ini
If you are going to use Remi repo, it would be (this is for PHP 8.2 but if you want PHP 8.1, just
s/php82/php81/g
) :mv /etc/opt/remi/php82/php.d/20-ldap.ini etc/opt/remi/php82/php.d/10-ldap.ini
I just tested it and there's no segfault if you do that.
It might be worth if @remicollet could change it in the distributed packages. Not sure however how to change it for the main distro package or where to ask for it. Btw this bug impacts all distributions using OpenSSL 1.0.2 and loading ldap after curl - there's nothing Amazon Linux 2 specific here.
Can confirm that this resolved apache php segfaults for me too when using ldap.
the ordering changed in 3de3e137bf7415fadcde873ca030436998f5b526
There is a way to affect loading order for built-in (static) extensions? In Gentoo all extensions are controlled by use flags and there is no way to make it dynamic without altering .ebuild file unfortunately, thus this bug still exists in such situation... The only workaround that I've found is to build php through portage without curl and ldap extensions and then compile and install these extensions manually and load these dynamically via php.ini in the proper order but this way doesn't feels right :(
think they extended support to June 2024.
Seems they have extended to June 2025 in the meantime.
There is a way to affect loading order for built-in (static) extensions?
No.
Description
(I'd like to note that while I'm reporting this against PHP 7.4.28, I also saw the problem in PHP 8.0.16)
The following code:
Resulted in this output:
But I expected this output instead:
Removing the
ldap_bind()
statement allows the script to run without segfaulting.When run under GDB, the following backtrace occurs (PHP 7.4):
Here's the backtrace from PHP 8.0:
AWS Linux 2 Hints (if you need them, of course!)
amazon-linux-extras install php7.4
php-ldap
by doingyum install php-ldap
yum remove php-pdo php-ldap php-debuginfo php-common php-mysqlnd php-fpm php-json php-cli
thenamazon-linux-extras disable php7.4
- then you can enable php8.0.gdb
to install debuginfo; if you want symbols in your backtraces you'll want to do that.Please don't hesitate to reach out if there are any further details I can get for you. Thank you!
PHP Version
PHP 7.4.28
Operating System
Amazon Linux 2