pradosoft / prado

Prado - Component Framework for PHP
Other
186 stars 70 forks source link

TDbCache::setValue collisions - TApplicationComponent::getClassFxEvents thrashing cache #940

Open majuca opened 1 year ago

majuca commented 1 year ago

HI, since the version 4.2.2, we have a lot error like this:

image

An idee?

ctrlaltca commented 1 year ago

Possibly related to issue #820 and this change in PR #841: https://github.com/pradosoft/prado/pull/841/files#diff-e89e3dcbd35b5518b965e802e9aa3d2d3ecb051a7c6fe6feaafd0928c17bc277 Is the database mysql/innodb? You should be able to get some more information on the last deadlock with SHOW ENGINE INNODB STATUS; Also, did you flush (wipe) the cache when upgrading prado version? prado-cli includes a "cache" action to perform the flush

majuca commented 1 year ago

Indeed, I would confirm that problem is related to the issue #820. If I change the application mode to Normal instead of Performance, it seems to be ok. Yes, I'am with mariadb/innodb 10.11.2.

The cache was removed completly.

Maybe the folloing code here in the function TApplicationComponent::getClassFxEvents:

if ($mode === TApplicationMode::Performance || isset(Prado::$classMap[$baseClassName])) { $_classfx[$className] = $fx; $cache->set(self::APP_COMPONENT_FX_CACHE, $_classfx); }

belisoful commented 1 year ago

It is difficult to debug that function. I have three comments:

1) Getting this function right is critical. 2) My apologies. 3) When mode is Performance the cache should always be set because it's not found, so something else is going on

The whole point is to load all the class data once and put it in one place for loading once.

belisoful commented 1 year ago

@majuca The issue is that the code "shouldn't" be getting to that point. The TApplicationSTatePersister should have had the GetClassFxEvents loaded from cache already. That code only runs once to load the cache, and not a second time, unless the cache is manually busted.

If that code is running multiple times (as it seems to be doing in your instance), then the classFxEvents are not being cached properly.

belisoful commented 1 year ago

PS. The TApplicationStatePersister (being loaded) is in the Class map, and would set the cache regardless of being in performance mode. That's the weird part.

belisoful commented 1 year ago

I think I got it. The scope of static $_classfx = null; is not correct. It should be outside the IF statement in the body of the function rather than in the if statement block. put static $_classfx = null; on line 54 rather than line 57.

When $_classfx[$className] = $fx; happens we are working on $_classfx out of scope.

belisoful commented 1 year ago

The bug was there before #820, #820 unlocked the bug, like a Fortnite achievement award. lol.

majuca commented 1 year ago

@belisoful Many thanks for your realy fast fix. I will test it today, I have several server where I can check this.

belisoful commented 1 year ago

Mathematically (technically), this is the issue. and God bless you for having servers to test the correction.

majuca commented 1 year ago

Bad news, I applied the fix on one server and the error still happens, some examples:

#0 /var/www/html/vendor/pradosoft/prado/framework/Caching/TDbCache.php(552): Prado\Data\TDbCommand->execute()
#1 /var/www/html/vendor/pradosoft/prado/framework/Caching/TDbCache.php(495): Prado\Caching\TDbCache->deleteValue('46f3d3389421db0...')
#2 /var/www/html/vendor/pradosoft/prado/framework/Caching/TCache.php(155): Prado\Caching\TDbCache->setValue('46f3d3389421db0...', Array, 0)
#3 /var/www/html/vendor/pradosoft/prado/framework/TApplicationComponent.php(76): Prado\Caching\TCache->set('prado:applicati...', Array)
#4 /var/www/html/vendor/pradosoft/prado/framework/TComponent.php(503): Prado\TApplicationComponent->getClassFxEvents(Object(Prado\Exceptions\TErrorHandler))
#5 /var/www/html/vendor/pradosoft/prado/framework/TComponent.php(381): Prado\TComponent->listen()
#6 /var/www/html/vendor/pradosoft/prado/framework/TApplication.php(812): Prado\TComponent->__construct()
#7 /var/www/html/vendor/pradosoft/prado/framework/TApplication.php(1162): Prado\TApplication->getErrorHandler()
#8 /var/www/html/vendor/pradosoft/prado/framework/TApplication.php(390): Prado\TApplication->onError(Object(Prado\Exceptions\TDbException))
#9 /var/www/html/index.php(19): Prado\TApplication->run()
#10 {main}
#0 /var/www/html/vendor/pradosoft/prado/framework/Caching/TDbCache.php(552): Prado\Data\TDbCommand->execute()
#1 /var/www/html/vendor/pradosoft/prado/framework/Caching/TDbCache.php(495): Prado\Caching\TDbCache->deleteValue('46f3d3389421db0...')
#2 /var/www/html/vendor/pradosoft/prado/framework/Caching/TCache.php(155): Prado\Caching\TDbCache->setValue('46f3d3389421db0...', Array, 0)
#3 /var/www/html/vendor/pradosoft/prado/framework/TApplicationComponent.php(76): Prado\Caching\TCache->set('prado:applicati...', Array)
#4 /var/www/html/vendor/pradosoft/prado/framework/TComponent.php(503): Prado\TApplicationComponent->getClassFxEvents(Object(TWebSocketModule))
#5 /var/www/html/vendor/pradosoft/prado/framework/TComponent.php(381): Prado\TComponent->listen()
#6 /var/www/html/vendor/pradosoft/prado/framework/Prado.php(354): Prado\TComponent->__construct()
#7 /var/www/html/vendor/pradosoft/prado/framework/TApplication.php(982): Prado\Prado::createComponent('Application.mod...')
#8 /var/www/html/vendor/pradosoft/prado/framework/TApplication.php(1047): Prado\TApplication->internalLoadModule('webSocket')
#9 /var/www/html/vendor/pradosoft/prado/framework/TApplication.php(1072): Prado\TApplication->applyConfiguration(Object(Prado\TApplicationConfiguration), false)
#10 /var/www/html/vendor/pradosoft/prado/framework/TApplication.php(1072): Prado\TApplication->applyConfiguration(Object(Prado\TApplicationConfiguration), false)
#11 /var/www/html/vendor/pradosoft/prado/framework/TApplication.php(1100): Prado\TApplication->applyConfiguration(Object(Prado\TApplicationConfiguration), false)
#12 /var/www/html/vendor/pradosoft/prado/framework/TApplication.php(373): Prado\TApplication->initApplication()
#13 /var/www/html/index.php(19): Prado\TApplication->run()
#14 {main}
#0 /var/www/html/vendor/pradosoft/prado/framework/Caching/TDbCache.php(552): Prado\Data\TDbCommand->execute()
#1 /var/www/html/vendor/pradosoft/prado/framework/Caching/TDbCache.php(495): Prado\Caching\TDbCache->deleteValue('46f3d3389421db0...')
#2 /var/www/html/vendor/pradosoft/prado/framework/Caching/TCache.php(155): Prado\Caching\TDbCache->setValue('46f3d3389421db0...', Array, 0)
#3 /var/www/html/vendor/pradosoft/prado/framework/TApplicationComponent.php(76): Prado\Caching\TCache->set('prado:applicati...', Array)
#4 /var/www/html/vendor/pradosoft/prado/framework/TComponent.php(503): Prado\TApplicationComponent->getClassFxEvents(Object(Prado\TApplicationStatePersister))
#5 /var/www/html/vendor/pradosoft/prado/framework/TComponent.php(381): Prado\TComponent->listen()
#6 /var/www/html/vendor/pradosoft/prado/framework/TApplication.php(872): Prado\TComponent->__construct()
#7 /var/www/html/vendor/pradosoft/prado/framework/TApplication.php(469): Prado\TApplication->getApplicationStatePersister()
#8 /var/www/html/vendor/pradosoft/prado/framework/TApplication.php(1234): Prado\TApplication->loadGlobals()
#9 /var/www/html/vendor/pradosoft/prado/framework/TApplication.php(386): Prado\TApplication->onLoadState()
#10 /var/www/html/index.php(19): Prado\TApplication->run()
#11 {main}
#0 /var/www/html/vendor/pradosoft/prado/framework/Caching/TDbCache.php(552): Prado\Data\TDbCommand->execute()
#1 /var/www/html/vendor/pradosoft/prado/framework/Caching/TDbCache.php(495): Prado\Caching\TDbCache->deleteValue('46f3d3389421db0...')
#2 /var/www/html/vendor/pradosoft/prado/framework/Caching/TCache.php(155): Prado\Caching\TDbCache->setValue('46f3d3389421db0...', Array, 0)
#3 /var/www/html/vendor/pradosoft/prado/framework/TApplicationComponent.php(76): Prado\Caching\TCache->set('prado:applicati...', Array)
#4 /var/www/html/vendor/pradosoft/prado/framework/TComponent.php(503): Prado\TApplicationComponent->getClassFxEvents(Object(Prado\Web\TUrlManager))
#5 /var/www/html/vendor/pradosoft/prado/framework/TComponent.php(381): Prado\TComponent->listen()
#6 /var/www/html/vendor/pradosoft/prado/framework/Web/THttpRequest.php(324): Prado\TComponent->__construct()
#7 /var/www/html/vendor/pradosoft/prado/framework/Web/THttpRequest.php(762): Prado\Web\THttpRequest->getUrlManagerModule()
#8 /var/www/html/vendor/pradosoft/prado/framework/Web/THttpRequest.php(778): Prado\Web\THttpRequest->parseUrl()
#9 /var/www/html/vendor/pradosoft/prado/framework/TApplication.php(1103): Prado\Web\THttpRequest->resolveRequest(Array)
#10 /var/www/html/vendor/pradosoft/prado/framework/TApplication.php(373): Prado\TApplication->initApplication()
#11 /var/www/html/index.php(19): Prado\TApplication->run()
#12 {main}
belisoful commented 1 year ago

It still seems that the cache is not getting set properly and is setting the cache much more than needed. I can imagine the first second of a web server going online, loading all its classes, and there being a collision here or maybe there, but after loading the first time, the code being executed should never be hit, especially in a performance mode.

That's the part that is boggling my mind. Are you running a shared DB cache system?

Could it be that the other instances are clobbering the cache causing the correction to not work properly? If other instances are incorrectly invalidating the shared DB cache, the new code is going to have the same issue.

belisoful commented 1 year ago

I can see how your situation would happen in a shared DB cache environment where other instances are not behaving nicely like the new fix. The other misbehaving servers will thrash the cache despite the fix.

In such a case, the solution is to deploy the micro-fix to all servers in the shared environment. It'll speed things up too.

belisoful commented 1 year ago

such suspense... lol. 😅 (j/k. more accurately: 🕵️)

majuca commented 1 year ago

I doing some tests this week, last end of wee was very busy. But, in our application, the database is not shared. But when this problem happens, it seams on servers where we have a lot of request by seconds.

belisoful commented 1 year ago

Especially in performance mode, that code shouldn't be touched after initial load. So somehow, the cache is still being thrashed.

majuca commented 1 year ago

Mmm, if I put a debug in the function getClassFxEvents, this one is not called only one time, or perhaps I misunderstood your remark concerning the initial load. Indeed, each time a new class is used (like a TPage), this the APP_COMPONENT_FX_CACHE cache is updated.

I would like to say that the problem is coming from the function setValue of TDbCache. Indeed, this function is not thread safe (atomic):

protected function setValue($key, $value, $expire)
{
   $this->deleteValue($key);
   return $this->addValue($key, $value, $expire);
}

As we delete the value and create it just after, two threads could have conflict and specialy if the two thread started a transaction. This explain the message :

TDbCommand failed to execute the SQL statement "DELETE FROM pradocache WHERE itemkey=:key": SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction

To have the function setValue as an atomic function, I would propose the following code:

protected function setValue($key, $value, $expire)
{
        $isCurrentTransaction = $this->getDbConnection()->getCurrentTransaction();
        $transaction = $this->getDbConnection()->getCurrentTransaction() ?? $this->getDbConnection()->beginTransaction();

    $this->deleteValue($key);
    $res = $this->addValue($key, $value, $expire);

        if (!$isCurrentTransaction) {
            $transaction->commit();
        }

        return $res;
}

I currently trying this code on one server to see if it is better.

belisoful commented 1 year ago

There is SQL for adding or updating a key if it exists, which may be a better route than transaction with deleting then re-adding. The "Add but update if exists" may be more efficient than a transaction as I believe that would be atomic.

What I am saying is that the first half of getClassFxEvents should be called with each new instance. but once the cache is loaded, the second half shouldn't be run, ever, especially in performance mode. your code keep running the second half and, again, it shouldn't be running the second half once you are cached and loaded. Somehow, your cache is getting busted... leading to "setValue" when it shouldn't be called.

the fact that you are even having a "setValue" is the problem. That it isn't atomic is a second problem.

To explain further, the first half gets the data, but all the data should be cached and shouldn't have to setValue any more. In performance mode, it shouldn't get to that point once the cache is loaded.

I hope this isn't lost in translation.

belisoful commented 1 year ago

I think the non-transaction atomic SQL we are looking for is INSERT INTO TABLE (id, name, age) VALUES (1, "A", 19) ON DUPLICATE KEY UPDATE name = VALUES (name), ...

belisoful commented 1 year ago

Put simply.... if you are getting a a butch of requests at the same time doing a "setValue" (and even remotely possibly colliding) then the caching isn't working right!!!! SetValue is the least of the problems.

majuca commented 1 year ago

Sure, you are right, your sql INSERT is for sure better than a SQL transaction and should fix the atomic problem of the funtion setValue.

And concerning the function getClassFxEvents, for me is it now ok and the only problem is see is on the setValue.

belisoful commented 1 year ago

Thank you for that. The getClassFxEvents working right was keeping me up.

belisoful commented 1 year ago

This should work for you:

protected function setValue($key, $value, $expire)
    {
        if (!$this->_cacheInitialized) {
            $this->initializeCache();
        }
        $expire = ($expire <= 0) ? 0 : time() + $expire;
        $sql = "INSERT INTO {$this->_cacheTable} (itemkey,value,expire) VALUES (:key,:value,$expire) ON DUPLICATE KEY UPDATE value=VALUES(value), expire=VALUES(expire)";
        $command = $this->getDbConnection()->createCommand($sql);
        $command->bindValue(':key', $key, \PDO::PARAM_STR);
        $command->bindValue(':value', serialize($value), \PDO::PARAM_LOB);

        try {
            $command->execute();
            return true;
        } catch (\Exception $e) {
            try {
                $this->initializeCache(true);
                $command->execute();
                return true;
            } catch (\Exception $e) {
                return false;
            }
        }
    }

NM on the addition stuff.

belisoful commented 1 year ago

The getClassFxEvents still isn't working quite right because TDBCache isn't loaded for most of its use. It's used mainly in loading modules (and their global fx event handlers; speeding it up) but TDBCache is a module that isn't installed until after the modules are loaded.

As such, getClassFXEvents shouldn't be using TDBCache. So this update to TDBCache is important for the DB Cache to work correctly in heavily concurrent environments, but it will be removed from this context. The TDBCache won't be any further issue in this function because it's being totally removed.

The solution for getClassFXEvents is to do an exclusive lock write to a new FXEvents cache file in the "runtime" folder. When loading, it reads that file. If there are any new classes to be cached by any instance, they do an exclusive lock write to the file. If the file cannot be locked, the writing is (properly) skipped. this will stop any concurrency issues. This way, the cache is built and restored immediately and properly without TDBCache.

I have a new function for setValue that takes into account most things. This SQL has been tested in SQLite3 and simulated Oracle and MSSQL (using ChatGPT as if it were an Oracle and MSSQL database). It also includes your proposed transaction fix for any unsupported DBs. Thank you for that.

    protected function setValue($key, $value, $expire)
    {
        if (!$this->_cacheInitialized) {
            $this->initializeCache();
        }
        $db = $this->getDbConnection();
        $driver = $db->getDriverName();
        if (in_array($driver, ['mysql', 'mysqli', 'sqlite', 'ibm', 'oci', 'sqlsrv', 'mssql', 'dblib', 'pgsql'])) {
            $expire = ($expire <= 0) ? 0 : time() + $expire;
            if (in_array($driver, ['mysql', 'mysqli', 'sqlite']))
                $sql = "REPLACE INTO {$this->_cacheTable} (itemkey,value,expire) VALUES (:key,:value,$expire)";
            elseif ($driver === 'pgsql') {
                $sql = "INSERT INTO {$this->_cacheTable} (itemkey, value, expire) VALUES (:key, :value, :expire) ".
                    "ON CONFLICT (itemkey) DO UPDATE SET value = EXCLUDED.value, expire = EXCLUDED.expire";
            } else {
                $sql = "MERGE INTO {$this->_cacheTable} AS c " .
                "USING (SELECT :key AS itemkey, :value AS value, $expire AS expire) AS data " .
                "ON c.itemkey = data.itemkey " .
                "WHEN MATCHED THEN " .
                    "UPDATE SET c.value = data.value, c.expire = data.expire " .
                "WHEN NOT MATCHED THEN " .
                    "INSERT (itemkey, value, expire) " .
                    "VALUES (data.itemkey, data.value, data.expire)";
            }
            $command = $db->createCommand($sql);
            $command->bindValue(':key', $key, \PDO::PARAM_STR);
            $command->bindValue(':value', serialize($value), \PDO::PARAM_LOB);

            try {
                $command->execute();
                return true;
            } catch (\Exception $e) {
                try {
                    $this->initializeCache(true);
                    $command->execute();
                    return true;
                } catch (\Exception $e) {
                    return false;
                }
            }
        } else {
            $isCurrentTransaction = $this->getDbConnection()->getCurrentTransaction();
            $transaction = $this->getDbConnection()->getCurrentTransaction() ?? $this->getDbConnection()->beginTransaction();

            $this->deleteValue($key);
            $return = $this->addValue($key, $value, $expire);

            if (!$isCurrentTransaction) {
                $transaction->commit();
            }

            return $return;
        }
    }
belisoful commented 1 year ago

Alright. I have a new getClassFxEvents method that doesn't use TDBCache. So the DB collisions won't be a problem at that point any more. In essence, due to TDBCache being a Module, it wasn't loading the cache properly for when the modules load... which is the point this method is designed to alleviate.

TDBCache loads too late.

This uses a file to store the cache as the cache is loaded and from then on its only read only, so no further writes when things are loaded up. Normal mode only stores the events of the framework, and performance mode stores the events of all classes.

Regarding TDBCache, that isn't a problem but it still should be updated to handle concurrency better. I have included an interesting solution: Driver specific SQL for 3 different kinds of systems and a fallback to @majuca beautifully simple transaction based "set" (delete/insert, "upsert").

(I mean "simple" in the mathematical simple solutions that solve things like E=mc^2; where simplicity is the beauty of the thing)

Others have run into this problem of "universal DB Set row" and reason why I think we should include a good solution in PRADO.

Regarding testing the Oracle, DB2, MSSQL, and Postgres SQL.... Interestingly, I had ChatGPT pretend to be these databases in turn, "...accept only SQL statements that the DB would accept, and output only what the DB would output. Do not output anything but what the DB would output"

I then entered the DB specific code (with prep work for the db having data to "set"). I tested with both existing data (update) and new data (insert). It worked properly. I believe these to be valid tests, I entered the SQL for the other DB systems and only got errors... as expected.

I think the new code on both fronts will resolve your issues. I didn't expect this bug to expose the faults of getClassFXEvents. We found the faults, and the PR should fix it.

majuca commented 1 year ago

Nice solution.

I could try it on a server having a lot of requests and give you a feed back.

belisoful commented 1 year ago

Indeed. That would be lovely. This should speed things up a little bit too.

belisoful commented 1 year ago

No news is good news.

belisoful commented 1 year ago

Are we ready to close this?

belisoful commented 1 year ago

Also, I'd like to reiterate that this bug is severe enough and the fix important enough as to maybe warrant releasing 4.2.3 ASAP (sooner rather than later).

I think it's also important to get closures as PRADO event handlers out there too.

majuca commented 1 year ago

Sorry for my delay, I was not able to give you a feedback these last months for personal reasons. I retry the patch this week and give you a feedback at least friday.

belisoful commented 1 year ago

@majuca I hope things are well for you and/or moving in the right direction.

majuca commented 1 year ago

I try the patch on two server with a lot of traffic. I does't have any problem or error this last days. I would confirm that it is working as expected.

majuca commented 11 months ago

Could we imagine making a new version of PRADO with just this one change? The difference is truly incredible.

belisoful commented 11 months ago

That's what I was suggesting.... 4.2.3 w/ all the event updates 'n stuff. I saw immediately how critical this update was when I put it in place.

ctrlaltca commented 11 months ago

I can prepare a release, but since we bumped up requirements from php 7.4 to php 8.1 I'd go directly for a 4.3.0 if you don't mind

ctrlaltca commented 11 months ago

Done, 4.3.0 is released. Updated prado-demos, prado-app, the pradoframework.net and api docs websites.

belisoful commented 11 months ago

yeeeeeesssss..... I entirely agree. and thank you. I hope people start contributing PRADO extensions in composer/packagist. That's a huge one. It was such a small change, with two functions, to unlock so much potential.

majuca commented 11 months ago

Good news. A big thank you to both of you for the incredible job you're doing.