revoltphp / event-loop

Revolt is a rock-solid event loop for concurrent PHP applications.
MIT License
807 stars 28 forks source link

Process Hangs with activated opcache #95

Closed unglaublicherdude closed 2 months ago

unglaublicherdude commented 2 months ago

Hi,

tbh, I don't know if I am correct here, but my research showed me earlier issues with interactions with opcache and revolt/event-loop so I hope you might be able to help us figure out this issue.

We have developed an AV SDK and at one point we decided to go async because of connection issues (basically using two connection one websocket and an https connection for an upload). So the first step we did is replacing our old Guzzle http-client with the amphp one replacing our upload feature.

With this SDK we build a nextcloud-app and ran into an issue that (for now, because its not released with that feature) only exists in our pipeline.

The setup locally and in the CI are almost the same, the tests that fail are run in a container but the behaviour differs. Locally everything works fine, but in the CI we always get the "Empty reply from server" error, what basically means the server did not answer. My assumption was that something (like a repeat or a delay) might sill be active in the eventloop but that would also have caused the issue on our local setup.

We could mitigate the issue by disabling the opcache via opcache.blacklist_filename and adding the whole amphp path to the blacklist, but that is almost not an option for the nextcloud app, because the blacklist has to have full-pathes and the customer would have to write it by semself.

I am basically helpless right now. Do you have any Idea, maybe at the SDK code, where use the http-client or some runtime-trick to prevent opcache from caching the amphp directory completely?

I also opened this issue for the amphp-httpclient.

unglaublicherdude commented 2 months ago

The image we are using has php version 8.2.21 installed.

Bilge commented 2 months ago

1) what

unglaublicherdude commented 2 months ago

Ok. To further explain. This CI runs a nextcloud container with an App that we developed.

The test does a put against the nextcloud instance to upload a file. While the upload our App checks against our backend if the file contains a virus, this is done via http PUT. Previously we used Guzzle for the http PUT but we exchanged it for amphp which has revolt as a dependency.

The code in question is this

                $cancellation = new DeferredCancellation();
        $connection = $this->_vaasConnection->GetAuthenticatedWebsocket();
        $pingTimer = EventLoop::repeat(5, function () use ($connection) {
            if ($connection->isConnected()) {
                $connection->ping();
            }
        });

        try {
            $httpClient = (new HttpClientBuilder())
                ->skipAutomaticCompression()
                ->skipDefaultAcceptHeader()
                ->skipDefaultUserAgent()
                ->build();

            $request = new Request($url, 'PUT');
            $request->setProtocolVersions(["1.1"]);
            $request->setTransferTimeout($this->_uploadTimeoutInSeconds);
            $request->setInactivityTimeout($this->_uploadTimeoutInSeconds);
            $request->setBody(StreamedContent::fromStream($fileStream, $fileSize));
            $request->addHeader("Authorization", $uploadToken);

            $response = $httpClient->request(
                $request, new TimeoutCancellation($this->_uploadTimeoutInSeconds), $cancellation->getCancellation());
            if ($response->getStatus() > 399) {
                $reason = $response->getBody()->buffer($cancellation->getCancellation());
                throw new UploadFailedException($reason, $response->getStatus());
            }
        } catch (\Exception $e) {
            if ($e instanceof HttpException) {
                throw new UploadFailedException($e->getMessage(), $e->getCode());
            }
            throw new VaasClientException($e->getMessage());
        } finally {
            if (EventLoop::isEnabled($pingTimer)) {
                EventLoop::cancel($pingTimer);
            }
            if (!$cancellation->isCancelled()) {
                $cancellation->cancel();
            }
        }

Looking at the CI we can see, that the request runs into a segmentation fault: image

This only happens, when opcache is activated and does not happen, when using Guzzle as http-client. But I also did not manage to reproduce this bug locally.

I did another test, removing the EventLoop::repeat. The segmentation fault keeps coming .

unglaublicherdude commented 2 months ago

Here you can find the coredumps from the apache child processes: https://github.com/GDATASoftwareAG/nextcloud-gdata-antivirus/pull/98#issuecomment-2291117317

And in our Dockerfile we have an automated way to get the apache2 debug-symbols for our version: https://github.com/GDATASoftwareAG/nextcloud-gdata-antivirus/blob/renovate/major-all-major-patch/Dockerfile.Nextcloud#L8-L15

unglaublicherdude commented 2 months ago

With dump_bt we get this

(gdb) dump_bt executor_globals.current_execute_data
[0x7efca1750180] version_compare(false, "0.3.0", "<") [internal function]
[0x7ffd3f49eeb0] Amp\File\Driver\UvFilesystemDriver::__construct() 
/var/www/html/apps/gdatavaas/vendor/amphp/file/src/Driver/UvFilesystemDriver.php:0
Bilge commented 2 months ago

The fact that \phpversion('uv') returned false but still ended up being passed to version_compare appears to be impossible given the current version of UvFilesystemDriver. Are you sure you're not using some ancient version?

In any case, I suggest you dump the full output of composer info on your CI server.

unglaublicherdude commented 2 months ago

Screenshot_20240815-230321

OK dude. Chill.

unglaublicherdude commented 2 months ago

This might just have been a Bug in amp/file (https://github.com/amphp/http-client/issues/365#issuecomment-2291491563) but I will still get the output of composer Info for you

Bilge commented 2 months ago

Screenshot_20240815-230321

OK dude. Chill.

Sure. I'm unsubscribed from this thread. Good luck.

trowski commented 2 months ago

Since this was some code in amphp/file triggering some opcache bug, which has been resolved, I'm going to close this issue.