neos / flow-development-collection

The unified repository containing the Flow core packages, used for Flow development.
https://flow.neos.io/
MIT License
139 stars 188 forks source link

neos/cache 5.3.25...5.3.26 Cannot add or modify cache entry because the affected keys are being modified by another process ("Flow_Session_Storage") #2483

Closed booooza closed 3 years ago

booooza commented 3 years ago

Description

Since neos/cache upgrade from 5.3.25 to 5.3.26 we (randomly?) got many of the following exception:

Neos\Cache\Exception: Cannot add or modify cache entry because the affected keys are being modified by another process ("Flow_Session_Storage") (1594725688)

#0 Packages/Libraries/neos/cache/Classes/Frontend/VariableFrontend.php(68): Neos\Cache\Backend\RedisBackend->set('...', '...', Array, 0)
#1 Data/Temporary/Production/Cache/Code/Flow_Object_Classes/Neos_Flow_Session_Session.php(479): Neos\Cache\Frontend\VariableFrontend->set('...', Array, Array, 0)
#2 Data/Temporary/Production/Cache/Code/Flow_Object_Classes/Neos_Flow_Session_Session.php(751): Neos\Flow\Session\Session_Original->putData('Neos_Flow_Secur...', Array)
#3 Data/Temporary/Production/Cache/Code/Flow_Object_Classes/Neos_Flow_Session_Session.php(689): Neos\Flow\Session\Session_Original->storeAuthenticatedAccountsInfo(Array)
#4 Packages/Framework/Neos.Flow/Classes/ObjectManagement/ObjectManager.php(561): Neos\Flow\Session\Session_Original->shutdownObject()
#5 Packages/Framework/Neos.Flow/Classes/ObjectManagement/ObjectManager.php(464): Neos\Flow\ObjectManagement\ObjectManager->callShutdownMethods(Object(SplObjectStorage))
#6 Data/Temporary/Production/Cache/Code/Flow_Object_Classes/Neos_Flow_Security_Context.php(219): Neos\Flow\ObjectManagement\ObjectManager->Neos\Flow\ObjectManagement\{closure}()
#7 Data/Temporary/Production/Cache/Code/Flow_Object_Classes/Neos_Flow_Security_Context.php(219): Closure->__invoke()
#8 Packages/Framework/Neos.Flow/Classes/ObjectManagement/ObjectManager.php(465): Neos\Flow\Security\Context_Original->withoutAuthorizationChecks(Object(Closure))
#9 [internal function]: Neos\Flow\ObjectManagement\ObjectManager->shutdown('Runtime', 'Neos\\Flow\\Core\\...')
#10 Packages/Framework/Neos.Flow/Classes/SignalSlot/Dispatcher.php(140): call_user_func_array(Array, Array)
#11 Packages/Framework/Neos.Flow/Classes/Core/Bootstrap.php(467): Neos\Flow\SignalSlot\Dispatcher->dispatch('Neos\\Flow\\Core\\...', 'bootstrapShutti...', Array)
#12 Packages/Framework/Neos.Flow/Classes/Core/Bootstrap.php(137): Neos\Flow\Core\Bootstrap->emitBootstrapShuttingDown('Runtime')
#13 Packages/Framework/Neos.Flow/Classes/Http/RequestHandler.php(114): Neos\Flow\Core\Bootstrap->shutdown('Runtime')
#14 Packages/Framework/Neos.Flow/Classes/Core/Bootstrap.php(112): Neos\Flow\Http\RequestHandler->handleRequest()
#15 Web/index.php(27): Neos\Flow\Core\Bootstrap->run()
#16 {main}

We can't relate the exception with a specific action, it seems to happen across multiple different controllers. Our Caches.yaml sets the flow session caches to redis as follows:

Flow_Session_MetaData:
  persistent: true
  backend: 'Neos\Cache\Backend\RedisBackend'
  backendOptions:
    defaultLifetime: 0
    hostname: '%env:REDIS_HOST%'
    password: '%env:REDIS_PASSWORD%'
    port: '%env:REDIS_PORT%'
    database: 0

Flow_Session_Storage:
  persistent: true
  backend: 'Neos\Cache\Backend\RedisBackend'
  backendOptions:
    defaultLifetime: 0
    hostname: '%env:REDIS_HOST%'
    password: '%env:REDIS_PASSWORD%'
    port: '%env:REDIS_PORT%'
    database: 0 

The upgrade from neos/cache 5.3.25 to 5.3.26 also seems to increase the number of requests to redis. I've attached a list of commands that redis sees on a single web request (opening the home page in our case) for the two versions:

5.3.25.txt 5.3.26.txt

5.3.25 causes 32 redis commands, with 5.3.26 it's 56 (+3 WATCH, +10 TTL, +10 PERSIST). The number also seemed pretty high before, is this to be expected or are we doing something wrong?

Steps to Reproduce

We are not able to consistently reproduce the issue. What we can say is that we run multiple instances of the app using the same Redis backend. We've also seen the exception mostly on production systems with higher user traffic. Since we've downgraded neos/cache back to 5.3.25 we didn't see the exception anymore.

Expected behavior

No exceptions

Actual behavior

Random exceptions

Affected Versions

neos/cache 5.3.26 (did not happen with 5.3.25) neos/flow 5.3.26

albe commented 3 years ago

This is an effect of https://github.com/neos/flow-development-collection/pull/2052 - the exception is thrown here https://github.com/neos/flow-development-collection/pull/2052/files#diff-544a10c061f371387bb5d71d50c3fd3498cf736e2aa16e59fff83e1854760eacR186

This effectively means, that the load in your case seems to be so high, that the retryAttemts+Interval do not fit your case and lead to the exception: https://github.com/neos/flow-development-collection/pull/2052/files#diff-544a10c061f371387bb5d71d50c3fd3498cf736e2aa16e59fff83e1854760eacR150-R151

If you do not see any bad effects of using 5.3.25 (redis running full of expired tag entries), I'd suggest you stay at that version. Regarding a future-proof solution, we might need to be able to configure the parameters and/or a way to more gracefully handle this exception - in essence, it states that "I could not set this specific cache entry, because an entry with the same tag is set too frequently by other processes already". Depending on the tag, this might be okay or not. Specifically for the SessionStorage, we set a tag [$this->storageIdentifier] - so this effectively leads to serializing all session writes. Maybe we need to reconsider the tags we set and/or distinguish tags that are "safe" for concurrent access from the unsafe ones.

/cc @nlx-lars @kaystrobach @kdambekalns

kaystrobach commented 3 years ago

I wonder if we can determine if there is a real unitOfWork with changes for the sessionCache. If so, this can be fixed. Another thing, that is imho problematic is, that we have many expeensive writing operations (tags) per entry instead of seldom expensive read (iteration over all items) to collect all items with specific tags. I think we should discuss the optimization criteria for redis in the near furure.

kaystrobach commented 3 years ago

Ps: i wonder how often we need to find all items for tags in flow … i do not have use case except logout for that.

kdambekalns commented 3 years ago

Seeing this while searching for a link target while inline editing, on a rather normal website. So… it's not an edge-case, it seems. Dang.

paxuclus commented 3 years ago

@kdambekalns 😞 Does this happen to you with the Flow_Session_Storage cache or something else? Can you please check which tag is being modified?

kdambekalns commented 3 years ago

The exception says: Cannot add or modify cache entry because the affected keys are being modified by another process ("Flow_Mvc_Routing_Resolve")

The request is agains the "nodes service" in this case:

Exception #1594725688 in line 72 of /application/Packages/Libraries/neos/cache/Classes/Frontend/VariableFrontend.php: Cannot add or modify cache entry because the affected keys are being modified by another process ("Flow_Mvc_Routing_Resolve")

42 Neos\Cache\Backend\RedisBackend::set("5943dde59406707d49f2b05ca1746bc1", "O:40:"Neos\Flow\Mvc\Routing\Dto\UriConstraints":1:…anguage%3Dde_DE";s:4:"path";s:12:"neos/preview";}}", array|12|, 0)
41 Neos\Cache\Frontend\VariableFrontend::set("5943dde59406707d49f2b05ca1746bc1", Neos\Flow\Mvc\Routing\Dto\UriConstraints, array|12|)
40 Neos\Flow\Mvc\Routing\RouterCachingService_Original::storeResolvedUriConstraints(Neos\Flow\Mvc\Routing\Dto\ResolveContext, Neos\Flow\Mvc\Routing\Dto\UriConstraints, Neos\Flow\Mvc\Routing\Dto\RouteTags)
39 Neos\Flow\Mvc\Routing\Router_Original::resolve(Neos\Flow\Mvc\Routing\Dto\ResolveContext)
38 Neos\Flow\Mvc\Routing\UriBuilder_Original::build(array|5|)
37 Neos\Flow\Mvc\Routing\UriBuilder_Original::uriFor("preview", array|5|, "Frontend\Node", "Neos.Neos", NULL)
36 Neos\Flow\Mvc\Routing\UriBuilder::uriFor("preview", array|1|, "Frontend\Node", "Neos.Neos", NULL)
35 Neos\Flow\Mvc\Routing\UriBuilder::Flow_Aop_Proxy_invokeJoinPoint(Neos\Flow\Aop\JoinPoint)
34 Neos\Flow\Aop\Advice\AdviceChain::proceed(Neos\Flow\Aop\JoinPoint)
33 Neos\Neos\Aspects\PluginUriAspect_Original::rewritePluginViewUris(Neos\Flow\Aop\JoinPoint)
32 Neos\Flow\Aop\Advice\AroundAdvice::invoke(Neos\Flow\Aop\JoinPoint)
31 Neos\Flow\Aop\Advice\AdviceChain::proceed(Neos\Flow\Aop\JoinPoint)
30 Neos\Flow\Mvc\Routing\UriBuilder::uriFor("preview", array|1|, "Frontend\Node", "Neos.Neos")
29 Neos\Neos\Service\LinkingService_Original::createNodeUri(Neos\Flow\Mvc\Controller\ControllerContext, Neos\ContentRepository\Domain\Model\Node, NULL, NULL, true, array|0|, "", false, array|0|, false)
28 Neos\Neos\ViewHelpers\Uri\NodeViewHelper_Original::render()
27 call_user_func(array|2|)
26 TYPO3Fluid\Fluid\Core\ViewHelper\AbstractViewHelper::callRenderMethod()
25 TYPO3Fluid\Fluid\Core\ViewHelper\AbstractViewHelper::initializeArgumentsAndRender()
24 TYPO3Fluid\Fluid\Core\ViewHelper\ViewHelperInvoker_Original::invoke("Neos\Neos\ViewHelpers\Uri\NodeViewHelper", array|9|, Neos\FluidAdaptor\Core\Rendering\RenderingContext, Closure)
23 TYPO3Fluid\Fluid\Core\ViewHelper\AbstractViewHelper::renderStatic(array|9|, Closure, Neos\FluidAdaptor\Core\Rendering\RenderingContext)
22 Service_Nodes_action_index_cee5d8a04270941ca14db08724a0eb8530a1acd4::{closure}()
21 TYPO3Fluid\Fluid\Core\ViewHelper\AbstractConditionViewHelper::renderStatic(array|5|, Closure, Neos\FluidAdaptor\Core\Rendering\RenderingContext)
20 Service_Nodes_action_index_cee5d8a04270941ca14db08724a0eb8530a1acd4::{closure}()
19 TYPO3Fluid\Fluid\ViewHelpers\AliasViewHelper_Original::renderStatic(array|1|, Closure, Neos\FluidAdaptor\Core\Rendering\RenderingContext)
18 Service_Nodes_action_index_cee5d8a04270941ca14db08724a0eb8530a1acd4::{closure}()
17 TYPO3Fluid\Fluid\ViewHelpers\ForViewHelper_Original::renderStatic(array|5|, Closure, Neos\FluidAdaptor\Core\Rendering\RenderingContext)
16 Service_Nodes_action_index_cee5d8a04270941ca14db08724a0eb8530a1acd4::render(Neos\FluidAdaptor\Core\Rendering\RenderingContext)
15 TYPO3Fluid\Fluid\View\AbstractTemplateView::render()
14 Neos\Flow\Mvc\Controller\ActionController_Original::renderView()
13 Neos\Flow\Mvc\Controller\ActionController_Original::callActionMethod()
12 Neos\Flow\Mvc\Controller\ActionController_Original::processRequest(Neos\Flow\Mvc\ActionRequest, Neos\Flow\Mvc\ActionResponse)
11 Neos\Flow\Mvc\Dispatcher_Original::initiateDispatchLoop(Neos\Flow\Mvc\ActionRequest, Neos\Flow\Mvc\ActionResponse)
10 Neos\Flow\Mvc\Dispatcher_Original::dispatch(Neos\Flow\Mvc\ActionRequest, Neos\Flow\Mvc\ActionResponse)
9 Neos\Flow\ObjectManagement\DependencyInjection\DependencyProxy::__call("dispatch", array|2|)
8 Neos\Flow\Mvc\DispatchComponent_Original::handle(Neos\Flow\Http\Component\ComponentContext)
7 Neos\Flow\Http\Component\ComponentChain_Original::handle(Neos\Flow\Http\Component\ComponentContext)
6 Neos\Flow\Http\Component\ComponentChain_Original::handle(Neos\Flow\Http\Component\ComponentContext)
5 Neos\Flow\Http\Middleware\ComponentChainMiddleware_Original::process(GuzzleHttp\Psr7\ServerRequest, Neos\Flow\Http\Middleware\MiddlewaresChain)
4 Neos\Flow\Http\Middleware\MiddlewaresChain_Original::process(GuzzleHttp\Psr7\ServerRequest, Neos\Flow\Http\Middleware\MiddlewaresChain)
3 Neos\Flow\Http\Middleware\MiddlewaresChain_Original::handle(GuzzleHttp\Psr7\ServerRequest)
2 Neos\Flow\Http\RequestHandler::handleRequest()
1 Neos\Flow\Core\Bootstrap::run()

HTTP REQUEST:
Neos_Session=…
https://www.acme.com/neos/content?node=%2Fsites%2Facme%2Fsomething%40user-username%3Blanguage%3Dde_DE
gzip, deflate, br
en,lv;q=0.8,de-DE;q=0.5,de;q=0.3
*/*
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:88.0) Gecko/20100101 Firefox/88.0
https
/neos/service/nodes?workspaceName=user-username&contextNode=%2Fsites%2Facme%40user-username%3Blanguage%3Dde_DE&dimensions%5Blanguage%5D%5B0%5D=de_DE&searchTerm=s&nodeTypes%5B0%5D=Neos.Neos%3ADocument
…

HTTP RESPONSE:
200

PHP PROCESS:
Inode: 688840
PID: 3828526
UID: 1000
GID: 1000
User: beach
albe commented 3 years ago

Good Morning, since a couple of weeks I randomly get this error from the Redis cache backend: Cannot add or modify cache entry because the affected keys are being modified by another process ("Flow_Mvc_Routing_Route")

The routing cache adds tags for each URI path prefix, i.e. an URI /some/path/to/action will share tags with all routes below /some. Assuming most of a sites pages live under a common path prefix that's only slightly less worse than the Session cache with the global shared tag, but something we can't simply avoid, since those tags are used to reliably flush route caches for all affected sub-paths (though I couldn't find a usage of RouterCachingService::flushCachesForUriPath($uriPath) at least in Flow core - so maybe that is actually unused? @bwaidelich do you know anything more about that?).

Maybe we need to revert #2052 in 5.3-6.2 and create a better solution for 6.3+? Not sure yet how that should look, but it would at least need to somehow prevent the watching of tags so broadly. First thing I notice: what about unlimited TTL tags? Can't those be skipped for watching because they will effectively not "change"?

Another attempt would likely be a whole different redis cache implementation as suggested by Kay.

bwaidelich commented 3 years ago

I couldn't find a usage of RouterCachingService::flushCachesForUriPath($uriPath) at least in Flow core so maybe that is actually unused? @bwaidelich do you know anything more about that?).

I don't.. I think this (IMO rather quirky) caching part was added for the RedirectHandler package.

kaystrobach commented 3 years ago

We are currently in discussion with redis experts, hope to provide some more input soon

frozenstupidity commented 3 years ago

We encountered this issue on the sessions too, we managed to patch it by overriding the redis backend to exclude the 'Flow_Session_MetaData:tag:session' tag from the watch. I suspect allowing a definition of safe and unsafe to watch tags may be the best solution.

kaystrobach commented 3 years ago

can you provide tha patch file?

frozenstupidity commented 3 years ago

@kaystrobach We just monkey patched it with an update to the line:

$keysToWatch = array_diff(
    array_unique(array_merge(array_column($redisTags, 'key'), [$key])),
    ['Flow_Session_MetaData:tag:session']
);

By overriding the redis class we use with the following:

<?php

namespace App\Site\Cache\Backend;

use Neos\Cache\Backend\RedisBackend as NeosRedisBackend;
use Neos\Cache\Exception as CacheException;

class RedisBackend extends NeosRedisBackend
{
    /**
     * Saves data in the cache.
     *
     * @param string $entryIdentifier An identifier for this specific cache entry
     * @param string $data The data to be stored
     * @param array $tags Tags to associate with this cache entry. If the backend does not support tags,
     *                    this option can be ignored.
     * @param integer $lifetime Lifetime of this cache entry in seconds. If NULL is specified, the default lifetime
     *                          is used. "0" means unlimited lifetime.
     * @throws \RuntimeException
     * @throws CacheException
     * @return void
     * @api
     * @SuppressWarnings(PHPMD.CyclomaticComplexity)
     * @SuppressWarnings(PHPMD.NPathComplexity)
     */
    public function set(string $entryIdentifier, string $data, array $tags = [], int $lifetime = null): void
    {
        if ($this->isFrozen()) {
            throw new \RuntimeException(sprintf('Cannot add or modify cache entry because the backend of cache "%s" is frozen.',$this->cacheIdentifier), //phpcs:ignore
                1323344192
            );
        }

        if ($lifetime === null) {
            $lifetime = $this->defaultLifetime;
        }

        $key = $this->buildKey('entry:' . $entryIdentifier);
        $setOptions = [];
        if ($lifetime > 0) {
            $setOptions['ex'] = $lifetime;
        }

        $redisTags = array_reduce($tags, function (array $map, string $tag) use ($entryIdentifier) {
            $map[] = ['key' => $this->buildKey('tag:' . $tag), 'value' => $entryIdentifier];
            $map[] = ['key' => $this->buildKey('tags:' . $entryIdentifier), 'value' => $tag];

            return $map;
        }, []);
        $keysToWatch = array_diff(
            array_unique(array_merge(array_column($redisTags, 'key'), [$key])),
            ['Flow_Session_MetaData:tag:session']
        );

        // @see http://backoffcalculator.com/?attempts=4&rate=2&interval=0.005
        $retryWaitInterval = 0.005;
        $maxRetryAttempts = 4;
        $retryAttempt = 0;

        do {
            // Watch the given keys for changes so if the TTL is changed during the transaction it can be retried
            $this->redis->watch(...$keysToWatch);

            foreach ($redisTags as $i => $tag) {
                $expire = $this->calculateRemainingLifetimeForKey($tag['key'], $lifetime);
                $redisTags[$i]['ttl'] = $expire;
            }

            $this->redis->multi();
            $result = $this->redis->set($key, $this->compress($data), $setOptions);
            if ($result === false) {
                $this->verifyRedisVersionIsSupported();
            }
            $this->redis->lRem($this->buildKey('entries'), $entryIdentifier, 0);
            $this->redis->rPush($this->buildKey('entries'), $entryIdentifier);

            foreach ($redisTags as $tag) {
                $this->redis->sAdd($tag['key'], $tag['value']);

                if ($tag['ttl'] > self::UNLIMITED_LIFETIME) {
                    $this->redis->expire($tag['key'], $tag['ttl']);
                } else {
                    $this->redis->persist($tag['key']);
                }
            }

            /**
             * @var array|bool|void $transactionResult
             */
            $transactionResult = $this->redis->exec();
            if ($transactionResult === false) {
                $this->redis->discard();

                if ($retryAttempt >= $maxRetryAttempts) {
                    throw new CacheException(sprintf('Cannot add or modify cache entry because the affected keys are being modified by another process ("%s")', $this->cacheIdentifier), //phpcs:ignore
                        1594725688
                    );
                }

                usleep((int)($retryWaitInterval * 1E6));
                $retryAttempt++;
                $retryWaitInterval *= 2;
            }
        } while ($transactionResult === false);
    }

    /**
     * @param string $identifier
     * @return string
     */
    private function buildKey(string $identifier): string
    {
        return $this->cacheIdentifier . ':' . $identifier;
    }

    /**
     * @param string $value
     * @return string
     */
    private function compress(string $value): string
    {
        return $this->useCompression() ? gzencode($value, $this->compressionLevel) : $value;
    }

    /**
     * @return boolean
     */
    private function useCompression(): bool
    {
        return $this->compressionLevel > 0;
    }

    /**
     * Calculate the max lifetime for a key
     *
     * @see https://redis.io/commands/ttl
     *
     * @param string $key The key of the key
     * @param int $lifetime The lifetime that should be used if no ttl is found
     * @return int
     */
    private function calculateRemainingLifetimeForKey(string $key, int $lifetime)
    {
        $ttl = $this->redis->ttl($key);
        if ($ttl === -2) {
            // key does not exist
            return $lifetime;
        } elseif ($ttl < 0 || $lifetime === self::UNLIMITED_LIFETIME) {
            return self::UNLIMITED_LIFETIME;
        }
        return max($ttl, $lifetime);
    }
}
frozenstupidity commented 3 years ago

@kaystrobach there are a couple of ways we contemplated fixing this:

  1. Making non blocking keys configurable in the Caches.yaml
  2. Making non blocking keys configurable in the set call

In the caches it makes things easy to oversee, however there may be cases where a tag is blocking for one entry but non blocking for another entry. As such the second options may be the better of the two going forward.

2case commented 3 years ago

make this

$retryWaitInterval = 0.005;
$maxRetryAttempts = 4;

configratable in the backendOptions and we could tweak around for each cache.

albe commented 3 years ago

So we discussed this today in our @neos/unicorn weekly and finally concluded we will have to revert #2052 unfortunately. The first idea was to just remove the exception, since if the setting of the cache fails, that should not be too problematic, but we do have persistent caches and places where a failing set operation is something the caller should know. We also can not simply strip away tags from this backend as there are too many things depending on the tags in the one or the other way.

Given the initial issue was that tags are not evicted and pile up, setting a TTL on the tags would still be useful. But watching them for changes to make the update atomic is apparently an issue in our usage of caches and tags. Maybe it's okay to have inconsistent tag entries and we need to deal with them. Also, we'd like to investigate how symfony redis cache adapter behaves and if it works well maybe support an adapter to symfony caches. @kitsunet want's to take a look

frozenstupidity commented 3 years ago

@albe @kitsunet if you need any help with this or want insights on our issues then feel free to contact me. I am happy to help.

bwaidelich commented 3 years ago

Any updates on this one? Seems to be a blocker for some folks

kaystrobach commented 3 years ago

We are solving it with a custom cache for sessions now and filtering the session tag

albe commented 3 years ago

If someone can prepare the revert PR we can merge it quickly. I'll try to do a release over the weekend, but can't work on anything today

albe commented 3 years ago

Reverting the PR now and will do upmerges and a release to be available on monday.

Note: If anyone is up to creating a new PR that adds TTL to the redis backend, but without the watch(), that would be ❤️-some!

albe commented 3 years ago

Release has been tagged, so this issue should be resolved. Still if someone is up to create an issue/PR to add TTL that is highly appreciated. Will close this issue for now though. Thanks everyone for their input and patience!