silverstripe / silverstripe-session-manager

Allow users to manage and revoke access to multiple login sessions across devices.
BSD 3-Clause "New" or "Revised" License
9 stars 6 forks source link

Garbage collector do not work on large tables #153

Closed lekoala closed 1 year ago

lekoala commented 1 year ago

I know it's probably something that is not supposed to happen, but if you are like me and you realize that you never cleaned the LoginSession table, running the Garbage collector is going to be a nightmare.

This is due to the fact it's using the orm to remove one by one every single record.

Even on a regular website, this is probably not great and using a couple of clean delete queries would be so much better.

I would like to propose to use direct DB queries instead if that something that would be considered since there is not much expectations about having orm extensions interfering with the login session anyway (side quest: don't use the orm at all for the login session queries).

Using the invalid all sessions task has similar issues

PRs

lekoala commented 1 year ago

example class + actually show what the class is doing instead of having a black box

<?php

namespace SilverStripe\SessionManager\Services;

use SilverStripe\Core\Injector\Injectable;
use SilverStripe\ORM\DataObject;
use SilverStripe\ORM\DB;
use SilverStripe\ORM\FieldType\DBDatetime;
use SilverStripe\Security\RememberLoginHash;
use SilverStripe\SessionManager\Models\LoginSession;

class GarbageCollectionService
{
    use Injectable;

    /**
     * Delete expired LoginSession and RememberLoginHash records
     */
    public function collect(): array
    {
        $expiredSessions = $this->collectExpiredSessions();
        $implicitlyExpiredSessions = $this->collectImplicitlyExpiredSessions();
        $loginHashes = $this->collectExpiredLoginHashes();
        return [
            'expiredSessions' => $expiredSessions,
            'implicitlyExpiredSessions' => $implicitlyExpiredSessions,
            'loginHashes' => $loginHashes,
        ];
    }

    /**
     * Collect all non-persistent LoginSession records that are older than the session lifetime
     */
    private function collectExpiredSessions(): int
    {
        $maxAge = LoginSession::getMaxAge();

        $schema = DataObject::getSchema();
        $sessionTableName = $schema->tableName(LoginSession::class);

        $query = sprintf(
            'DELETE FROM "%s" WHERE "Persistent" = 0 AND "LastAccessed" < ?',
            $sessionTableName,
        );
        DB::prepared_query($query, [
            $maxAge
        ]);
        return DB::get_conn()->affectedRows() ?? 0;
    }

    /**
     * Collect all persistent LoginSession records where the associated RememberLoginHash has expired
     */
    private function collectImplicitlyExpiredSessions(): int
    {
        $now = date('Y-m-d H:i:s', DBDatetime::now()->getTimestamp());
        $maxAge = LoginSession::getMaxAge();

        $schema = DataObject::getSchema();
        $sessionTableName = $schema->tableName(LoginSession::class);
        $rememberTableName = $schema->tableName(RememberLoginHash::class);

        $join = sprintf(
            'LEFT JOIN "%s" ON "%s"."ID" = "%s"."LoginSessionID"',
            $rememberTableName,
            $sessionTableName,
            $rememberTableName
        );

        // If a persistent session has no login hash, use LastAccessed
        $noHashExpired = sprintf(
            '"%s"."ExpiryDate" IS NULL AND "%s"."LastAccessed" < ?',
            $rememberTableName,
            $sessionTableName
        );
        $where = sprintf(
            'WHERE ("%s"."Persistent" = 1) AND ((%s) OR ("%s"."ExpiryDate" < ?))',
            $sessionTableName,
            $noHashExpired,
            $rememberTableName
        );
        $subquery = sprintf(
            'SELECT "%s"."ID" FROM "%s" %s %s',
            $sessionTableName,
            $sessionTableName,
            $join,
            $where
        );
        $query = sprintf(
            'DELETE FROM "%s" WHERE "Persistent" = 1 AND "ID" IN (%s)',
            $sessionTableName,
            $subquery
        );
        DB::prepared_query($query, [
            $maxAge,
            $now
        ]);
        return DB::get_conn()->affectedRows() ?? 0;
    }

    /**
     * Collect all RememberLoginHash records that have expired
     */
    private function collectExpiredLoginHashes(): int
    {
        $now = date('Y-m-d H:i:s', DBDatetime::now()->getTimestamp());
        $schema = DataObject::getSchema();
        $rememberTableName = $schema->tableName(RememberLoginHash::class);

        $query = sprintf(
            'DELETE FROM "%s" WHERE "ExpiryDate" < ?',
            $rememberTableName,
        );
        DB::prepared_query($query, [
            $now
        ]);
        return DB::get_conn()->affectedRows() ?? 0;
    }
}
lekoala commented 1 year ago

Maybe I could be using SQLDelete now that i think about it

Another quick "fix" would be to ->limit() and wrap in a transaction the current orm calls, at least it wouldn't fail, just slowly clean everything

Ideally, i do think there should be a way to batch delete these (either with removeAll knowing that it's ok to do so, either by implement a custom bachRemoveAll). This lead me to think about this https://github.com/silverstripe/silverstripe-framework/issues/10850

GuySartorelli commented 1 year ago

Out of all of these recommendations, I'd be most in favour of a batchRemoveAll() method being added in framework and used here.

I would definitely avoid raw SQL queries - partly because there is an expectation that projects or modules which use before/after delete hooks should be respected, and partly because removing items with a simple filter shouldn't require a raw SQL query - that's an indication that there's a limitation in the ORM that we should work to resolve instead.

lekoala commented 1 year ago

Here is another take with a limited list wrapped in a transaction. at least it's not crashing and if you run it multiple times it ends up cleaning everything

<?php

namespace SilverStripe\SessionManager\Services;

use SilverStripe\Core\Injector\Injectable;
use SilverStripe\ORM\DB;
use SilverStripe\ORM\FieldType\DBDatetime;
use SilverStripe\Security\RememberLoginHash;
use SilverStripe\SessionManager\Models\LoginSession;

class GarbageCollectionService
{
    use Injectable;

    /**
     * Delete expired LoginSession and RememberLoginHash records
     */
    public function collect()
    {
        $expiredSessions = $this->collectExpiredSessions();
        $implicitlyExpiredSessions = $this->collectImplicitlyExpiredSessions();
        $loginHashes = $this->collectExpiredLoginHashes();
        return [
            'expiredSessions' => $expiredSessions,
            'implicitlyExpiredSessions' => $implicitlyExpiredSessions,
            'loginHashes' => $loginHashes,
        ];
    }

    private function batchRemove($datalist)
    {
        $i = 0;
        DB::get_conn()->transactionStart();
        foreach ($datalist->limit(1000) as $record) {
            $record->delete();
            $i++;
        }
        DB::get_conn()->transactionEnd();
        return $i;
    }

    /**
     * Collect all non-persistent LoginSession records that are older than the session lifetime
     */
    private function collectExpiredSessions()
    {
        $maxAge = LoginSession::getMaxAge();
        $sessions = LoginSession::get()->filter([
            'LastAccessed:LessThan' => $maxAge,
            'Persistent' => 0
        ]);
        return $this->batchRemove($sessions);
    }

    /**
     * Collect all persistent LoginSession records where the associated RememberLoginHash has expired
     */
    private function collectImplicitlyExpiredSessions()
    {
        $now = DBDatetime::now()->getTimestamp();
        $sessions = LoginSession::get()->filter([
            'Persistent' => 1,
            'LoginHash.ExpiryDate:LessThan' => date('Y-m-d H:i:s', $now)
        ]);
        return $this->batchRemove($sessions);
    }

    /**
     * Collect all RememberLoginHash records that have expired
     */
    private function collectExpiredLoginHashes()
    {
        $now = DBDatetime::now()->getTimestamp();
        $sessions = RememberLoginHash::get()->filter([
            'ExpiryDate:LessThan' => date('Y-m-d H:i:s', $now)
        ]);
        return $this->batchRemove($sessions);
    }
}
GuySartorelli commented 1 year ago

I don't have time to look at that code in depth right now, much less test it, but the idea of running a batch remove with a transaction seems imminently sensible for this use case. In general this seems like a good way forward.

GuySartorelli commented 1 year ago

@lekoala that batchRemove method seems like a sensible way forward at a glance - though I'd rename it batchRemoveAll, and I'd recommend using chunkedFetch() instead of limit(), since the limit will require it to be run multiple times if there's more than 1000 items, whereas chunked fetch will keep trying in chunks.

If you'd like to submit a PR with this implementation targetting CMS 4, I'd be happy to review it.

There's an argument to be made for using chunkedFetch in DataList::removeAll() as well - but I feel like that would need to be something people can opt into, and the nicest way to do that would be with an argument - which we can't very well add outside of a major release. Perhaps something for the CMS 6 wishlist. But I digress.

lekoala commented 1 year ago

@GuySartorelli i've chosen "limit" because from my experience, running more than 1000 queries in one php script often ends badly. Let's say you need to delete 10k records because you have a very busy website, chunkedFetch will run 10 batch of 1000 queries, with potential other queries due to onBefore/onAfterDelete cleanup. I tend to try to keep my cron jobs scripts running in less than 5 minutes (first, because i'm not using a job queue^^, but second, it's also lighter on the database).

I'd say the issue is not so bad if you run the cleanup task on a regular basis (ie: via the task), but if you use the GarbageCollectionJob on a daily basis, you can end up deleting a lot of stuff in one go). When using the "limit", and provided that you run your script as often as possible, you would probably end up cleaning everything at some point, when the website is less busy.

I'm happy to make a PR with the code above and rename the method. Not sure that using "chunkedFetch" really helps dealing with the actual issue (ie: making too many queries).

Would you agree if I also add, like I did, some data about what's being collected?

GuySartorelli commented 1 year ago

If we're changing the behaviour to explicitly not try to collect everything at once, I think we have to consider that a breaking change unless it's behind an opt-in flag. I'd be okay with a PR that uses limit only if a developer has set some config (which is default false) to true.

lekoala commented 1 year ago

that works for me i can do something along these lines :)

lekoala commented 1 year ago

@GuySartorelli here it goes with a config flag https://github.com/silverstripe/silverstripe-session-manager/pull/163

GuySartorelli commented 1 year ago

Reopening 'cause I want to add a note about the new config on the 5.1 changelog.

emteknetnz commented 1 year ago

Have merged changelog PR