libvips / php-vips-ext

Low-level libvips binding for PHP
MIT License
101 stars 11 forks source link

SIGABRT only on php-fpm - php8 on nginx - macOS Catalina and Big Sur #42

Closed inod closed 3 years ago

inod commented 3 years ago

I'm only having this issue on macOS Catalina and Big Sur. Linux works as expected.

✅ Vips cli works as expected (vips-8.11.0 from homebrew). ✅ Calling php-vips-ext functions like vips_call() from php cli works as expected.

❌ Loading a local browser page (homebrew nginx with php-fpm) that calls vips_call() returns 502 Bad Gateway and logs: WARNING: [pool www] child ... exited on signal 6 (SIGABRT) ... (running the same php from php cli works as expected)

Also I've found that if vips.so is enabled, any php code that calls openssl (Google cloud apis, stripe api, etc) causes the same 502 Bad Gateway error, even if there are no calls to php-vips-ext functions, but ONLY on php-fpm and ONLY when vips.so is enabled. If I run the same php code from php CLI, everything works as expected. If I disable vips.so, all the normal calls to openssl work fine.

I've tried latest php 8.0.7 from homebrew and also php 8.0.6 and 8.0.7 from phpbrew. I've tried installing php-vips-ext from pecl and also from phpbrew and pickle, also versions 1.0.12 and 1.0.11. I've tried running php-fpm and nginx as sudo (for permission errors), it didn't work but maybe I messed up something.

Everything worked as expected until last weekend or so, I don't know what changed or what update caused this to start failing. I've since updated from Catalina to Big Sur thinking it might fix it but... no. 😵

I'd love to fix this or help fix this, but I don't know what my next steps should be.

By the way, thank you very much for vips and php-vips-ext.

jcupitt commented 3 years ago

Hi @inod,

That's really odd. I'm sorry it's been so painful.

Could you try vips-8.11.1? Homebrew updated today with the new version and it has a couple of fixes to module finding which might possibly help.

inod commented 3 years ago

Could you try vips-8.11.1? Homebrew updated today with the new version and it has a couple of fixes to module finding which might possibly help.

Thank you for your quick reply, I've upgraded vips and I have the same error, even after restarting the computer and again reinstalling nginx, php and all :(

That's really odd. I'm sorry it's been so painful.

Yes, I don't get it either. If I had the time, I'd debug it more, but I think I'll only be able to try again in a few weeks. I can always use exec('vipsthumbnail') and friends if I even start thinking about recycling the OS.

mstaack commented 3 years ago

this might also be related to #41

edit: vips-8.11.1 solves this for me

inod commented 3 years ago

this might also be related to #41

Sounds very similar, and I'll add that I'm on an intel mac with case-sensitive fs.

inod commented 3 years ago

MacOS Crash report: php-fpm_2021-07-06-17.log

kleisauke commented 3 years ago

Looks like this crash happens in libintl, presumably due to some kind of incompatibility. What's the output of these commands?

$ otool -L /usr/local/opt/gettext/lib/libintl.8.dylib
$ otool -L /usr/local/opt/gettext/bin/gettext
$ /usr/local/opt/gettext/bin/gettext --version
$ otool -L /usr/local/opt/vips/lib/libvips.42.dylib

(I unfortunately don't have any macOS system to test this on)

inod commented 3 years ago

@kleisauke thank you. Here is the output:

$ otool -L /usr/local/opt/gettext/lib/libintl.8.dylib
/usr/local/opt/gettext/lib/libintl.8.dylib:
    /usr/local/opt/gettext/lib/libintl.8.dylib (compatibility version 11.0.0, current version 11.0.0)
    /usr/lib/libiconv.2.dylib (compatibility version 7.0.0, current version 7.0.0)
    /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation (compatibility version 150.0.0, current version 1770.106.0)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1292.0.0)
$ otool -L /usr/local/opt/gettext/bin/gettext
/usr/local/opt/gettext/bin/gettext:
    /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation (compatibility version 150.0.0, current version 1770.106.0)
    /usr/local/Cellar/gettext/0.21/lib/libintl.8.dylib (compatibility version 11.0.0, current version 11.0.0)
    /usr/lib/libiconv.2.dylib (compatibility version 7.0.0, current version 7.0.0)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1292.0.0)
$ /usr/local/opt/gettext/bin/gettext --version
gettext (GNU gettext-runtime) 0.21
Copyright (C) 1995-2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Ulrich Drepper.
$ otool -L /usr/local/opt/vips/lib/libvips.42.dylib
/usr/local/opt/vips/lib/libvips.42.dylib:
    /usr/local/opt/vips/lib/libvips.42.dylib (compatibility version 56.0.0, current version 56.1.0)
    /usr/lib/libexpat.1.dylib (compatibility version 7.0.0, current version 8.0.0)
    /usr/local/opt/libgsf/lib/libgsf-1.114.dylib (compatibility version 115.0.0, current version 115.47.0)
    /usr/local/opt/libxml2/lib/libxml2.2.dylib (compatibility version 12.0.0, current version 12.12.0)
    /usr/local/opt/fftw/lib/libfftw3.3.dylib (compatibility version 10.0.0, current version 10.9.0)
    /usr/local/opt/imagemagick/lib/libMagickCore-7.Q16HDRI.10.dylib (compatibility version 11.0.0, current version 11.0.0)
    /usr/local/opt/orc/lib/liborc-0.4.0.dylib (compatibility version 33.0.0, current version 33.0.0)
    /usr/local/opt/little-cms2/lib/liblcms2.2.dylib (compatibility version 3.0.0, current version 3.12.0)
    /usr/local/opt/openexr/lib/libOpenEXR-3_0.29.dylib (compatibility version 29.0.0, current version 29.0.0)
    /usr/local/opt/openexr/lib/libOpenEXRUtil-3_0.29.dylib (compatibility version 29.0.0, current version 29.0.0)
    /usr/local/opt/openexr/lib/libIex-3_0.29.dylib (compatibility version 29.0.0, current version 29.0.0)
    /usr/local/opt/openexr/lib/libIlmThread-3_0.29.dylib (compatibility version 29.0.0, current version 29.0.0)
    /usr/local/opt/imath/lib/libImath-3_0.28.dylib (compatibility version 28.0.0, current version 28.0.2)
    /usr/local/opt/openjpeg/lib/libopenjp2.7.dylib (compatibility version 7.0.0, current version 2.4.0)
    /usr/local/opt/libheif/lib/libheif.1.dylib (compatibility version 14.0.0, current version 14.0.0)
    /usr/local/opt/poppler/lib/libpoppler-glib.8.dylib (compatibility version 8.0.0, current version 8.21.0)
    /usr/local/opt/librsvg/lib/librsvg-2.2.dylib (compatibility version 50.0.0, current version 50.0.0)
    /usr/local/opt/gdk-pixbuf/lib/libgdk_pixbuf-2.0.0.dylib (compatibility version 4201.0.0, current version 4201.6.0)
    /usr/local/opt/openslide/lib/libopenslide.0.dylib (compatibility version 5.0.0, current version 5.1.0)
    /usr/local/opt/libmatio/lib/libmatio.11.dylib (compatibility version 12.0.0, current version 12.3.0)
    /usr/local/opt/hdf5/lib/libhdf5.200.dylib (compatibility version 201.0.0, current version 201.0.0)
    /usr/local/opt/cfitsio/lib/libcfitsio.9.dylib (compatibility version 9.0.0, current version 9.3.49)
    /usr/local/opt/webp/lib/libwebpmux.3.dylib (compatibility version 4.0.0, current version 4.6.0)
    /usr/local/opt/webp/lib/libwebpdemux.2.dylib (compatibility version 3.0.0, current version 3.7.0)
    /usr/local/opt/webp/lib/libwebp.7.dylib (compatibility version 9.0.0, current version 9.1.0)
    /usr/local/opt/pango/lib/libpangocairo-1.0.0.dylib (compatibility version 4801.0.0, current version 4801.6.0)
    /usr/local/opt/pango/lib/libpango-1.0.0.dylib (compatibility version 4801.0.0, current version 4801.6.0)
    /usr/local/opt/harfbuzz/lib/libharfbuzz.0.dylib (compatibility version 20801.0.0, current version 20801.0.0)
    /usr/local/opt/cairo/lib/libcairo.2.dylib (compatibility version 11603.0.0, current version 11603.0.0)
    /usr/local/opt/fontconfig/lib/libfontconfig.1.dylib (compatibility version 14.0.0, current version 14.0.0)
    /usr/local/opt/freetype/lib/libfreetype.6.dylib (compatibility version 24.0.0, current version 24.4.0)
    /usr/local/opt/libtiff/lib/libtiff.5.dylib (compatibility version 13.0.0, current version 13.0.0)
    /usr/local/opt/libspng/lib/libspng.0.dylib (compatibility version 0.0.0, current version 0.0.0)
    /usr/local/opt/libpng/lib/libpng16.16.dylib (compatibility version 54.0.0, current version 54.0.0)
    /usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.11)
    /usr/local/opt/libimagequant/lib/libimagequant.0.dylib (compatibility version 0.0.0, current version 0.0.0)
    /usr/local/opt/mozjpeg/lib/libjpeg.62.dylib (compatibility version 62.0.0, current version 62.3.0)
    /usr/local/opt/libexif/lib/libexif.12.dylib (compatibility version 16.0.0, current version 16.4.0)
    /usr/local/opt/glib/lib/libgmodule-2.0.0.dylib (compatibility version 6801.0.0, current version 6801.3.0)
    /usr/local/opt/glib/lib/libgio-2.0.0.dylib (compatibility version 6801.0.0, current version 6801.3.0)
    /usr/local/opt/glib/lib/libgobject-2.0.0.dylib (compatibility version 6801.0.0, current version 6801.3.0)
    /usr/local/opt/glib/lib/libglib-2.0.0.dylib (compatibility version 6801.0.0, current version 6801.3.0)
    /usr/local/opt/gettext/lib/libintl.8.dylib (compatibility version 11.0.0, current version 11.0.0)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1292.100.5)
    /usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 905.6.0)
inod commented 3 years ago

Here is log of a different but related php-fpm crash that only happens when I enable vips.so in php.ini. php-fpm_2021-07-07-135223.log

This happens when I load a browser page that only does a normal curl_exec(). Php-fpm crashes even if I don't call any php-vips-ext functions. Invoking the script from cli works as expected.

kleisauke commented 3 years ago

Output looks OK, gettext is keg_only, which means that only /usr/local/opt/gettext/bin/gettext should link to Homebrew's gettext (i.e. /usr/local/Cellar/gettext/0.21/lib/libintl.8.dylib) and not to its system library (i.e. /usr/local/opt/gettext/lib/libintl.8.dylib).

What's the output of running brew doctor?

kleisauke commented 3 years ago

After further inspection of the first log, I noticed this:

objc[24552]: +[NSNumber initialize] may have been in progress in another thread when fork() was called.
+[NSNumber initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.

This SO post is probably relevant here: https://stackoverflow.com/a/52230415.

Perhaps you could try running php-fpm with the OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES environment variable set?

kleisauke commented 3 years ago

In case the /usr/local/Cellar/php/8.0.8/homebrew.mxcl.php.plist property list file exists, you could disable fork safety in php-fpm as follows:

#!/bin/bash

plist="/usr/local/Cellar/php/8.0.8/homebrew.mxcl.php.plist"
defaults read $plist EnvironmentVariables &> /dev/null

if [[ $? -ne 0 ]]; then
  plutil -insert EnvironmentVariables -xml '<dict/>' $plist
fi

plutil -replace EnvironmentVariables.OBJC_DISABLE_INITIALIZE_FORK_SAFETY -string YES $plist

echo "Disabled fork safety for PHP-FPM."

(untested)

inod commented 3 years ago

What's the output of running brew doctor?


$ brew doctor
Please note that these warnings are just used to help the Homebrew maintainers
with debugging if you file an issue. If everything you use Homebrew for is
working fine: please don't worry or file an issue; just ignore this. Thanks!

Warning: The filesystem on /System/Volumes/Data appears to be case-sensitive. The default macOS filesystem is case-insensitive. Please report any apparent problems.



> Perhaps you could try running php-fpm with the OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES environment variable set?

**Thank you!** I added `export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES`, restarted terminal and confirmed the variable is set.

Now when I run manually `php-fpm --nodaemonize` and load the script in a browser, the curl calls now work, but now the `vips_call('thumbnail', null, ...)` times out with **504 Gateway Time-out**. The same php script runs ok in cli.

This is getting weirder and I don't know what to do next. (tried also running nginx and php as sudo)

How can I know what's causing the **504 Gateway Time-out**?

Thank you all for your time.
inod commented 3 years ago

In case the /usr/local/Cellar/php/8.0.8/homebrew.mxcl.php.plist property list file exists, you could disable fork safety in php-fpm as follows:

Thank you for your script, I've run it, but when I execute brew services start php and load the browser tab, php-fpm crashes with the same error as before, as if it is not reading or ignoring the EnvironmentVariables.

This is my new /usr/local/Cellar/php/8.0.8/homebrew.mxcl.php.plist file:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>EnvironmentVariables</key>
    <dict>
        <key>OBJC_DISABLE_INITIALIZE_FORK_SAFETY</key>
        <string>YES</string>
    </dict>
    <key>KeepAlive</key>
    <true/>
    <key>Label</key>
    <string>homebrew.mxcl.php</string>
    <key>ProgramArguments</key>
    <array>
        <string>/usr/local/opt/php/sbin/php-fpm</string>
        <string>--nodaemonize</string>
    </array>
    <key>RunAtLoad</key>
    <true/>
    <key>StandardErrorPath</key>
    <string>/usr/local/var/log/php-fpm.log</string>
    <key>WorkingDirectory</key>
    <string>/usr/local/var</string>
</dict>
</plist>
mstaack commented 3 years ago

Just to note, I am seeing crashes again related to vips extension. So this here is probably the major cause.

Happy to help debug this on my machine also.

kleisauke commented 3 years ago

php-fpm crashes with the same error as before, as if it is not reading or ignoring the EnvironmentVariables.

I think this can probably be fixed by using clear_env = no in /usr/local/etc/php/8.0/php-fpm.d/www.conf or by setting the specified environment variable explicitly in the same config file, for example:

env[OBJC_DISABLE_INITIALIZE_FORK_SAFETY] = YES

Now when I run manually php-fpm --nodaemonize and load the script in a browser, the curl calls now work, but now the vips_call('thumbnail', null, ...) times out with 504 Gateway Time-out. The same php script runs ok in cli.

I'm not sure what's causing this timeout. Perhaps another php-fpm daemon was running when you were trying this?

Just to note, I am seeing crashes again related to vips extension. So this here is probably the major cause.

Ouch, sorry to hear that. I wonder what's causing that, are you able to share a stack trace? If you are using php-fpm and encounter such errors:

objc[24552]: +[NSNumber initialize] may have been in progress in another thread when fork() was called.

Then it's probably because of Apple's enforced rule checks that disallows forking a multithreaded application. In these cases, the process will SIGABRT. It can be workaround by setting the OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES environment variable. This blog post contains some more details about this behavior.

kleisauke commented 3 years ago

It might also be worth passing this environment variable through nginx, to avoid using clear_env = no in the php-fpm configuration file:

location / {
    ...
    fastcgi_param OBJC_DISABLE_INITIALIZE_FORK_SAFETY YES; 
    ...
}

(untested, so not sure if that works)

inod commented 3 years ago

@kleisauke using your advice I was able to pass the variable using env[OBJC_DISABLE_INITIALIZE_FORK_SAFETY] = YES and I confirmed it in phpinfo (value appears as a 1 in both "environment" and "PHP variables") but it still crashes with the same error. (I also tried using clear_env=no just to check, and the same crash happens)

I'm not sure what's causing this timeout. Perhaps another php-fpm daemon was running when you were trying this?

Thanks for the suggestion, I checked and there weren't any.

kleisauke commented 3 years ago

Curious, to summarize:

Did you mean by "the same crash happens" that it produces a SIGABRT with the following message (just like your first log)?:

objc[24552]: +[NSNumber initialize] may have been in progress in another thread when fork() was called.

Also, could you try to downgrade libvips to check whether this issue occurs there as well? This is to rule things out.

I had a further look into php-fpm, it seems that the child php-fpm processes are forked from a parent process after vips_init has been called. This could be dangerous, as explained here, but since libvips no longer starts threads during vips_init (since commit https://github.com/libvips/libvips/commit/99423649b90dd3b47b33b044f7baa6f28cbf6457), it's presumably to be safe now.

What we could try is to move vips_init from the PHP_MINIT function, which is being called when PHP loads the extension, and happens at the parent process, to the PHP_RINIT function, which is being called before every single PHP request (just like grpc, see https://github.com/grpc/grpc/pull/11814). I think this is safe to do, because vips_init contains logic to avoid initialization if it's already initialized, so performance doesn't suffer. But I'm not sure if it fixes this issue.

inod commented 3 years ago

@kleisauke your summary is correct, just to clarify and add more datapoints:

Running php-fpm --nodaemonize with that environment variable set doesn't crash curl anymore (It is fixed running this way, and I can live with it).

Now the timeouts: vips_call('thumbnail', null, $in, 200, ['height' => 300, 'crop' => 'centre']) results in a timeout. vips_image_write_to_file(...) creates an empty file with the correct name and times out. vips_image_new_from_file(...) and vips_image_get(...) work OK (no timeout and return GObject and correct loader).

Did you mean by "the same crash happens" that it produces a SIGABRT with the following message (just like your first log)?:

This is the new crash log on brew services start php-fpm calling curl_exec and the fork variable confirmed in phpinfo: php-fpm_2021-07-08.log

kleisauke commented 3 years ago

In the above example, is $in the result of vips_image_new_from_file('in.jpg')['out']? The vips_thumbnail function operates on filenames instead of VipsImage *, hence the correct usage would be (off the top of my head):

<?php

$image = vips_call('thumbnail', null, 'in.jpg', 200, ['height' => 300, 'crop' => 'centre'])['out'];
vips_image_write_to_file($image, __DIR__ . '/out.jpg');

php-fpm_2021-07-08.log

Does this crash only manifest when the vips extension is enabled? Given that stack trace, I think issue https://github.com/Homebrew/homebrew-core/issues/80460 is similar here.

inod commented 3 years ago

In the above example, is $in the result of vips_image_new_from_file('in.jpg')['out']? The vips_thumbnail function operates on filenames instead of VipsImage *, hence the correct usage would be (off the top of my head):

You are correct, I didn't paste the full code, my mistake. It worked before and still works on linux, so I tested your code and same issue: Works on command line, times out on brew services start php-fpm, SIGABRT on php-fpm --nodaemonize.

php-fpm_2021-07-08.log

Does this crash only manifest when the vips extension is enabled? Given that stack trace, I think issue Homebrew/homebrew-core#80460 is similar here.

Yes, it crashes only when vips.so is enabled. Without vips.so everything works, and I don't have any other extensions. That issue sure looks similar.

kleisauke commented 3 years ago

I'm running out of ideas. :(

What's the output of running brew services list? It could be that Homebrew's PHP background service uses a different property list file than I initially thought in comment https://github.com/libvips/php-vips-ext/issues/42#issuecomment-875581277. For example, if you see this:

$ brew services list
...
php        started root         /Library/LaunchDaemons/homebrew.mxcl.php.plist
...

Then check the contents of /Library/LaunchDaemons/homebrew.mxcl.php.plist and use that bash script on that file if necessary. It's also possible that ~/​Library/LaunchAgents/homebrew.mxcl.php.plist is used.

inod commented 3 years ago
$ brew services list
Name  Status  User Plist
mysql started ino  /Users/ino/Library/LaunchAgents/homebrew.mxcl.mysql.plist
nginx started ino  /Users/ino/Library/LaunchAgents/homebrew.mxcl.nginx.plist
php   started ino  /Users/ino/Library/LaunchAgents/homebrew.mxcl.php.plist
redis started ino  /Users/ino/Library/LaunchAgents/homebrew.mxcl.redis.plist

$ cat /Users/ino/Library/LaunchAgents/homebrew.mxcl.php.plist
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>EnvironmentVariables</key>
    <dict>
        <key>OBJC_DISABLE_INITIALIZE_FORK_SAFETY</key>
        <string>YES</string>
    </dict>
    <key>KeepAlive</key>
    <true/>
    <key>Label</key>
    <string>homebrew.mxcl.php</string>
    <key>ProgramArguments</key>
    <array>
        <string>/usr/local/opt/php/sbin/php-fpm</string>
        <string>--nodaemonize</string>
    </array>
    <key>RunAtLoad</key>
    <true/>
    <key>StandardErrorPath</key>
    <string>/usr/local/var/log/php-fpm.log</string>
    <key>WorkingDirectory</key>
    <string>/usr/local/var</string>
</dict>
</plist>

Today I upgraded php to 8.0.8_1 and it still times out on vips_call.

kleisauke commented 3 years ago

Ah yes, ignore my previous comment, I just found out that the /usr/local/Cellar/php/<VERSION>/homebrew.mxcl.php.plist file is copied to ~/Library/LaunchAgents/ whenever the service is (re-)started.

Are there any errors recorded in /usr/local/var/log/php-fpm.log or /usr/local/var/log/nginx/error.log when this timeout occurs? What's the output of restarting php-fpm with verbose logging enabled (brew services restart php -vv)? Do you get the 504 Gateway Time-out immediately? If so, does increasing nginx's fastcgi_read_timeout setting help?

kleisauke commented 3 years ago

It could also be a permission issue (since you mentioned that vips_image_write_to_file gives a timeout). You could check with ps aux | grep php which user is running the php(-fpm) processes.

It seems that php-fpm, provided by Homebrew, sets the user and group to _www by default. https://github.com/Homebrew/homebrew-core/blob/683fe0bddadbbc45833677277a6982ba606d1834/Formula/php.rb#L162-L163

This can be changed by modifying the /usr/local/etc/php/8.0/php-fpm.d/www.conf file like this:

user = YOUR_USERNAME
group = staff
inod commented 3 years ago

Are there any errors recorded in /usr/local/var/log/php-fpm.log or /usr/local/var/log/nginx/error.log when this timeout occurs?

$tail -f /usr/local/var/log/php-fpm.log

[13-Jul-2021 20:20:49] NOTICE: fpm is running, pid 5412
[13-Jul-2021 20:20:49] NOTICE: ready to handle connections
[13-Jul-2021 23:03:12] NOTICE: Terminating ...
[13-Jul-2021 23:03:12] NOTICE: exiting, bye-bye!
[13-Jul-2021 23:03:16] NOTICE: [pool www] 'user' directive is ignored when FPM is not running as root
[13-Jul-2021 23:03:16] NOTICE: [pool www] 'user' directive is ignored when FPM is not running as root
[13-Jul-2021 23:03:16] NOTICE: [pool www] 'group' directive is ignored when FPM is not running as root
[13-Jul-2021 23:03:16] NOTICE: [pool www] 'group' directive is ignored when FPM is not running as root
[13-Jul-2021 23:03:16] NOTICE: fpm is running, pid 5937
[13-Jul-2021 23:03:16] NOTICE: ready to handle connections

and nothing gets logged when timeout happens. Server log:

tail -f ../../var/log/example.test.*

==> ../../var/log/example.test.access.log <==
127.0.0.1 - - [13/Jul/2021:23:09:05 +0200] "GET / HTTP/1.1" 499 0 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:90.0) Gecko/20100101 Firefox/90.0"

==> ../../var/log/example.test.error.log <==
2021/07/13 23:10:07 [error] 930#0: *659 upstream timed out (60: Operation timed out) while reading response header from upstream, client: 127.0.0.1, server: example.test, request: "GET / HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "example.test"

==> ../../var/log/example.test.access.log <==
127.0.0.1 - - [13/Jul/2021:23:10:07 +0200] "GET / HTTP/1.1" 504 160 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:90.0) Gecko/20100101 Firefox/90.0"

What's the output of restarting php-fpm with verbose logging enabled (brew services restart php -vv)?

Nothing gets logged. It's exactly the same as the first log above.

Do you get the 504 Gateway Time-out immediately? If so, does increasing nginx's fastcgi_read_timeout setting help?

I've tried fastcgi_read_timeout 120; and keepalive_timeout 120; and I get the 504 after the 120s. Running the same script in command line finishes in a fraction of a second.

It could also be a permission issue (since you mentioned that vips_image_write_to_file gives a timeout). You could check with ps aux | grep php which user is running the php(-fpm) processes.

It is possible, and I've tried finding files in /usr/local/ with wrong permissions and didn't find any blatantly wrong but I could have missed some. But because vips_image_write_to_file is able to write an empty file with the correct name before timing out, it seemed to me that it shouldn't be the permissions, but my knowledge stops there.

$ps aux | grep php
ino               6179   0.0  0.0  4286840    756 s002  S+   11:15PM   0:00.01 grep php
ino               6177   0.0  0.0  4614212    756   ??  S    11:15PM   0:00.00 /usr/local/opt/php/sbin/php-fpm --nodaemonize
ino               6176   0.0  0.1  4617108  11772   ??  S    11:15PM   0:00.05 /usr/local/opt/php/sbin/php-fpm --nodaemonize
ino               6174   0.0  0.3  4613980  23264   ??  S    11:15PM   0:00.20 /usr/local/opt/php/sbin/php-fpm --nodaemonize
ino               5942   0.0  9.6  7932608 808764   ??  S    11:03PM   3:05.06 /Applications/PhpStorm.app/Contents/MacOS/phpstorm
ino               5850   0.0  0.0  4277516    596 s001  S+   11:02PM   0:00.01 tail -f php-fpm.log

This can be changed by modifying the /usr/local/etc/php/8.0/php-fpm.d/www.conf file like this:

user = YOUR_USERNAME
group = staff

I already have this in place, and I've also tried running sudo brew services start for nginx and php a few weeks ago and it didn't help, and I had to chown to fix some file and dir permissions that homebrew changed and helpfully printed, though.

I thank you very much for your effort, and I'm sorry that this isn't improving much, but at least I've learned some things.

kleisauke commented 3 years ago

I think I might have found the culprit, in comment https://github.com/libvips/php-vips-ext/issues/42#issuecomment-876516325 I made the assumption that libvips no longer starts threads during vips_init. Although this is the case on Linux and Windows, it seems that this isn't true on macOS.

The issue is caused because g_system_thread_get_scheduler_settings always returns false on macOS, since the sched_getattr system call (to fetch the scheduling policy) isn't available there.

...
Header <sys/syscall.h> has symbol "SYS_sched_getattr" : NO
...

As a result, when g_thread_pool_new is called it falls back to this g_thread_new call. https://github.com/GNOME/glib/blob/1c673dd0781b623f3f6e3040cb9ea3605b1613c9/glib/gthreadpool.c#L652

I tried to reproduce this on macOS with GitHub Actions, see commit https://github.com/kleisauke/libvips/commit/253b7699d2a486ef82f726f02d99e25cb55a9d82. The test case does the following (similar to php-fpm):

This does indeed seem to deadlock and produce an empty file, see:

$ clang -g -Wall php-vips-ext-issue-42.c `pkg-config vips --cflags --libs` -o php-vips-ext-issue-42
$ gtimeout 60s ./php-vips-ext-issue-42 || (ls -l && exit 1)
vips_init
fork
PID 78211 (parent) waiting for child PID 78212
PID 78212 (child) doing work
<TIMES OUT AFTER 60s>
...
-rw-r--r--   1 runner  staff       0 Jul 14 11:18 x.jpg

(Relevant log: https://github.com/kleisauke/libvips/runs/3065988766)

I think there are a few ways to resolve this issue:

  1. Move vips_init from the PHP_MINIT function to the PHP_RINIT function to ensure that vips_init is called after php-fpm forks the process (as explained here).
  2. Implement g_system_thread_get_scheduler_settings in GLib for macOS (see this FIXME)
  3. Construct g_thread_pool_new on first use (perhaps in vips__thread_execute?), similar to commit https://github.com/libvips/libvips/commit/99423649b90dd3b47b33b044f7baa6f28cbf6457.

My preference is for the last two options. @jcupitt WDYT?

@inod You can downgrade libvips to 8.10.6 as a possible workaround (for now).

jcupitt commented 3 years ago

Oh well done @kleisauke, congrats on your persistence.

Yes, delaying threading init as long as we can is probably best, since this could fix issues on ruby as well.

Yes, a GOnce in vips__thread_execute() would be simple.

Are there any other threading things we need to delay? What about mutexes / cond vars / etc.?

jcupitt commented 3 years ago

Thinking about this a little more, just delaying the threadpool build should be enough.

I made this change to the head of 8.11, I hope that's OK (it might fix a ruby-vips issue too).

https://github.com/libvips/libvips/commit/6946c3b9d9a047fbef5014341ced1ef4ad4d4be3

mstaack commented 3 years ago

okay my fpm process also dies when using a curl method within php:

[21-Jul-2021 09:15:24] WARNING: [pool valet] child 47683 exited on signal 11 (SIGSEGV) after 17.719142 seconds from start
[21-Jul-2021 09:15:24] NOTICE: [pool valet] child 47725 started

disabling ext-vips solves this.

have not tried the latest edits @jcupitt mentioned above, please consider tagging a new release :)

versions:

vips -v
vips-8.11.2-Sat Jul  3 14:17:10 UTC 2021
vips

vips property => value
Vips headers version => 8.11.0
Vips library version => 8.11.2
Vips ABI version => 55.1.13
kleisauke commented 3 years ago

fwiw, by rebasing the above reproducer on top of the 8.11 branch (see https://github.com/kleisauke/libvips/commit/56ab2119112918159068e2dbd4f4a1f67f0a8d7e), I can confirm that commit https://github.com/libvips/libvips/commit/6946c3b9d9a047fbef5014341ced1ef4ad4d4be3 resolves this issue.

jcupitt commented 3 years ago

8.11.3 is out now and should fix this.

https://github.com/libvips/libvips/releases/tag/v8.11.3

jcupitt commented 3 years ago

(sorry for the trouble this has caused)

inod commented 3 years ago

A big thank you to everyone involved! In a couple of days I will upgrade and close this issue.

Great work! ⭐️⭐️⭐️⭐️⭐️