misskey-dev / misskey

๐ŸŒŽ A completely free and open interplanetary microblogging platform ๐Ÿš€
https://misskey-hub.net/
GNU Affero General Public License v3.0
10.09k stars 1.38k forks source link

Data race leading to Multiple Database deadlocks processing 'Delete' Activity #15005

Open eternal-flame-AD opened 2 days ago

eternal-flame-AD commented 2 days ago

๐Ÿ’ก Summary

ใƒชใƒขใƒผใƒˆใ‹ใ‚‰"Delete"ใจ"Update"ใ‚ขใ‚ฏใƒ†ใƒ“ใƒ†ใ‚ฃใŒใƒ‡ใƒผใ‚ฟ็ซถๅˆใ‚ใ‚Šใพใ™ใ€ๅŒๆ™‚ใซๅ—ใ‘ๅ–ใฃใŸใ‚‰็ซถๅˆใ‚’่ตทใ“ใ‚Šใ‚ขใ‚ซใ‚ฆใƒณใƒˆใฏๅ‰Š้™คใงใใพใ›ใ‚“ใ€‚

Proposed fix (not deployed yet) partially taken from #14752, should also fix #14728 as a side effect : https://forge.yumechi.jp/yume/yumechi-no-kuni/commit/7a581b7e80209c30cef7b47f95efa40e3f0f3797

Partial diff:

diff --git a/packages/backend/src/core/activitypub/ApInboxService.ts b/packages/backend/src/core/activitypub/ApInboxService.ts
index f3aa46292e..fccf86cb91 100644
--- a/packages/backend/src/core/activitypub/ApInboxService.ts
+++ b/packages/backend/src/core/activitypub/ApInboxService.ts
@@ -509,19 +509,12 @@ export class ApInboxService {
            return `skip: delete actor ${actor.uri} !== ${uri}`;
        }

-       const user = await this.usersRepository.findOneBy({ id: actor.id });
-       if (user == null) {
-           return 'skip: actor not found';
-       } else if (user.isDeleted) {
+       if (!(await this.usersRepository.update({ id: actor.id, isDeleted: false }, { isDeleted: true })).affected) {
            return 'skip: already deleted';
        }

        const job = await this.queueService.createDeleteAccountJob(actor);

-       await this.usersRepository.update(actor.id, {
-           isDeleted: true,
-       });
-
        this.globalEventService.publishInternalEvent('remoteUserUpdated', { id: actor.id });

        return `ok: queued ${job.name} ${job.id}`;
diff --git a/packages/backend/src/core/activitypub/models/ApPersonService.ts b/packages/backend/src/core/activitypub/models/ApPersonService.ts
index 8c4e40c561..e01b098194 100644
--- a/packages/backend/src/core/activitypub/models/ApPersonService.ts
+++ b/packages/backend/src/core/activitypub/models/ApPersonService.ts
@@ -557,7 +557,9 @@ export class ApPersonService implements OnModuleInit {
        if (moving) updates.movedAt = new Date();

        // Update user
-       await this.usersRepository.update(exist.id, updates);
+       if (!(await this.usersRepository.update({ id: exist.id, isDeleted: false }, updates)).affected) {
+           return 'skip';
+       }

        if (person.publicKey) {
            await this.userPublickeysRepository.update({ userId: exist.id }, {
@@ -662,7 +664,7 @@ export class ApPersonService implements OnModuleInit {

    @bindThis
    public async updateFeatured(userId: MiUser['id'], resolver?: Resolver): Promise<void> {
-       const user = await this.usersRepository.findOneByOrFail({ id: userId });
+       const user = await this.usersRepository.findOneByOrFail({ id: userId, isDeleted: false });
        if (!this.userEntityService.isRemoteUser(user)) return;
        if (!user.featured) return;

๐Ÿฅฐ Expected Behavior

Delete ใ‚ขใ‚ฏใƒ†ใ‚ฃใƒ“ใƒ†ใ‚ฃใฎๅ‡ฆ็†ใ‚ขใƒˆใƒŸใƒƒใ‚ฏๅŒ–ใซใ—ใฆไธŠใซUpdateใ‚ˆใ‚Šๅ„ชๅ…ˆใ™ใ‚‹ใ€‚

๐Ÿคฌ Actual Behavior

Deadlockใ‚’ไฝ•ๅ›ž็นฐใ‚Š่ฟ”ใ—ใฆใ„ใ‚‹

024-11-19 18:30:58.284 UTC [649591] ERROR:  deadlock detected
2024-11-19 18:30:58.284 UTC [649591] DETAIL:  Process 649591 waits for ShareLock on transaction 49746958; blocked by process 649589.
        Process 649589 waits for ShareLock on transaction 49746959; blocked by process 649591.
        Process 649591: DELETE FROM "drive_file" WHERE "id" IN ($1)
        Process 649589: DELETE FROM "user" WHERE "id" IN ($1)
2024-11-19 18:30:58.284 UTC [649591] HINT:  See server log for query details.
2024-11-19 18:30:58.284 UTC [649591] CONTEXT:  while updating tuple (15737,1) in relation "user"
        SQL statement "UPDATE ONLY "public"."user" SET "bannerId" = NULL WHERE $1::pg_catalog.text OPERATOR(pg_catalog.=) "bannerId"::pg_catalog.text"
2024-11-19 18:30:58.284 UTC [649591] STATEMENT:  DELETE FROM "drive_file" WHERE "id" IN ($1)
2024-11-19 18:30:58.333 UTC [649589] LOG:  duration: 1050.252 ms  execute <unnamed>: DELETE FROM "user" WHERE "id" IN ($1)
2024-11-19 18:30:58.333 UTC [649589] DETAIL:  parameters: $1 = '9zlddyw7rq6n0etg'
INFO 2  [remote ap]     Deleting the Actor: https://social.vivaldi.net/users/ismaelwahid
INFO 3  [queue delete-account]  Deleting account of 9zlddyw7rq6n0etg ...
DONE 3  [queue delete-account]  All of notes deleted
DONE 3  [queue delete-account]  All of files deleted
INFO 2  [remote ap]     Updating the Person: https://social.vivaldi.net/users/ismaelwahid
QueryFailedError: deadlock detected
    at PostgresQueryRunner.query (/misskey/node_modules/.pnpm/typeorm@0.3.20_ioredis@5.4.1_pg@8.13.1/node_modules/typeorm/driver/postgres/PostgresQueryRunner.js:219:19)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async DeleteQueryBuilder.execute (/misskey/node_modules/.pnpm/typeorm@0.3.20_ioredis@5.4.1_pg@8.13.1/node_modules/typeorm/query-builder/DeleteQueryBuilder.js:52:33) {
  query: 'DELETE FROM "drive_file" WHERE "id" IN ($1)',
  parameters: [ '9zlddz8grq6n0eth' ],
  driverError: error: deadlock detected
      at /misskey/node_modules/.pnpm/pg@8.13.1/node_modules/pg/lib/client.js:535:17
      at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
      at async PostgresQueryRunner.query (/misskey/node_modules/.pnpm/typeorm@0.3.20_ioredis@5.4.1_pg@8.13.1/node_modules/typeorm/driver/postgres/PostgresQueryRunner.js:184:25)
      at async DeleteQueryBuilder.execute (/misskey/node_modules/.pnpm/typeorm@0.3.20_ioredis@5.4.1_pg@8.13.1/node_modules/typeorm/query-builder/DeleteQueryBuilder.js:52:33) {
    length: 483,
    severity: 'ERROR',
    code: '40P01',
    detail: 'Process 649591 waits for ShareLock on transaction 49746958; blocked by process 649589.\n' +
      'Process 649589 waits for ShareLock on transaction 49746959; blocked by process 649591.',
    hint: 'See server log for query details.',
    position: undefined,
    internalPosition: undefined,
    internalQuery: undefined,
    where: 'while updating tuple (15737,1) in relation "user"\n' +
      'SQL statement "UPDATE ONLY "public"."user" SET "bannerId" = NULL WHERE $1::pg_catalog.text OPERATOR(pg_catalog.=) "bannerId"::pg_catalog.text"',
    schema: undefined,
    table: undefined,
    column: undefined,
    dataType: undefined,
    constraint: undefined,
    file: 'deadlock.c',
    line: '1147',
    routine: 'DeadLockReport'
  },
  length: 483,
  severity: 'ERROR',
  code: '40P01',
  detail: 'Process 649591 waits for ShareLock on transaction 49746958; blocked by process 649589.\n' +
    'Process 649589 waits for ShareLock on transaction 49746959; blocked by process 649591.',
  hint: 'See server log for query details.',
  position: undefined,
  internalPosition: undefined,
  internalQuery: undefined,
  where: 'while updating tuple (15737,1) in relation "user"\n' +
    'SQL statement "UPDATE ONLY "public"."user" SET "bannerId" = NULL WHERE $1::pg_catalog.text OPERATOR(pg_catalog.=) "bannerId"::pg_catalog.text"',
  schema: undefined,
  table: undefined,
  column: undefined,
  dataType: undefined,
  constraint: undefined,

๐Ÿ“ Steps to Reproduce

I can't reproduce it on purpose but I did contact another instance owner and they observed the same issue ...

ๅ‰ๆๆกไปถใŒ่ค‡้›‘ใŸใ‚ใซๆ„ๅ›ณ็š„ใซๅ†็พใ™ใ‚‹ใ“ใจใฏใงใใพใ›ใ‚“ใŒใ€ๅˆฅใฎใ‚คใƒณใ‚นใ‚ฟใƒณใ‚นๆ‰€ๆœ‰่€…ใซ้€ฃ็ตกใ—ใŸใจใ“ใ‚ใ€ๅŒใ˜ๅ•้กŒใŒ็™บ็”Ÿใ—ใฆใ„ใ‚‹ใใ†ใงใ™ใ€‚

๐Ÿ’ป Frontend Environment

Not frontend issue.

๐Ÿ›ฐ Backend Environment (for server admin)

* Installation Method or Hosting Service: Docker-compose.yml (https://forge.yumechi.jp/yume/yumechi-no-kuni/src/branch/develop/compose_example.yml)
* Misskey: [2024.11.0-yumechinokuni.5](https://forge.yumechi.jp/yume/yumechi-no-kuni/src/tag/2024.11.0-yumechinokuni.5) (tracking misskey-dev#develop)
* OS and Architecture: Arch Linux x86_64

Reported to be also reproducible on:

https://github.com/paricafe/misskey/tree/pari

Do you want to address this bug yourself?