laravel / pulse

Laravel Pulse is a real-time application performance monitoring tool and dashboard for your Laravel application.
https://pulse.laravel.com
MIT License
1.43k stars 164 forks source link

Serialization failure: 1213 Deadlock found when trying to get lock #268

Closed sebastiaanluca closed 8 months ago

sebastiaanluca commented 8 months ago

Pulse Version

1.0.0-beta8

Laravel Version

10.39.0

PHP Version

8.2.4

Livewire Version

3.3.5

Database Driver & Version

MySQL Ver 8.0.32 for Linux on x86_64 (MySQL Community Server - GPL)

Description

On our current setup, we regularly get these deadlock issues reported to Sentry. Happened 14 times now since we added Pulse on the 14th of December. The more visitors, the more deadlocks happen. It's not limited to one type of recorder, but happens for all. And we do have extra plugins/cards added, but it already happened once before we added any extras.

Small note: I've set up a dedicated new MySQL database just for Pulse so the app queries wouldn't interfere with those of Pulse. Check further down for the configuration.

Stack trace

An example exception:

Illuminate\Database\QueryException
SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction (Connection: mysql-pulse, SQL: insert into `pulse_aggregates` (`aggregate`, `bucket`, `key`, `period`, `type`, `value`) values (count, 1703171580, users:1:roles, 60, cache_hit, 1), (count, 1703171520, users:1:roles, 360, cache_hit, 1), (count, 1703171520, users:1:roles, 1440, cache_hit, 1), (count, 1703167200, users:1:roles, 10080, cache_hit, 1) on duplicate key update `value` = `value` + values(`value`))

Raw stack trace:

PDOException: SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction
#18 /vendor/laravel/framework/src/Illuminate/Database/Connection.php(605): PDOStatement::execute
#17 /vendor/laravel/framework/src/Illuminate/Database/Connection.php(605): Illuminate\Database\Connection::Illuminate\Database\{closure}
#16 /vendor/laravel/framework/src/Illuminate/Database/Connection.php(809): Illuminate\Database\Connection::runQueryCallback
#15 /vendor/laravel/framework/src/Illuminate/Database/Connection.php(776): Illuminate\Database\Connection::run
#14 /vendor/laravel/framework/src/Illuminate/Database/Connection.php(593): Illuminate\Database\Connection::affectingStatement
#13 /vendor/laravel/framework/src/Illuminate/Database/Query/Builder.php(3506): Illuminate\Database\Query\Builder::upsert
#12 /vendor/laravel/pulse/src/Storage/DatabaseStorage.php(166): Laravel\Pulse\Storage\DatabaseStorage::upsertCount
#11 /vendor/laravel/pulse/src/Storage/DatabaseStorage.php(72): Laravel\Pulse\Storage\DatabaseStorage::Laravel\Pulse\Storage\{closure}
#10 /vendor/laravel/framework/src/Illuminate/Collections/Traits/EnumeratesValues.php(240): Illuminate\Support\Collection::each
#9 /vendor/laravel/pulse/src/Storage/DatabaseStorage.php(72): Laravel\Pulse\Storage\DatabaseStorage::Laravel\Pulse\Storage\{closure}
#8 /vendor/laravel/framework/src/Illuminate/Database/Concerns/ManagesTransactions.php(30): Illuminate\Database\Connection::transaction
#7 /vendor/laravel/pulse/src/Storage/DatabaseStorage.php(48): Laravel\Pulse\Storage\DatabaseStorage::store
#6 /vendor/laravel/pulse/src/Ingests/StorageIngest.php(29): Laravel\Pulse\Ingests\StorageIngest::ingest
#5 /vendor/laravel/pulse/src/Pulse.php(308): Laravel\Pulse\Pulse::Laravel\Pulse\{closure}
#4 /vendor/laravel/pulse/src/Pulse.php(535): Laravel\Pulse\Pulse::rescue
#3 /vendor/laravel/pulse/src/Pulse.php(307): Laravel\Pulse\Pulse::store
#2 /vendor/laravel/pulse/src/PulseServiceProvider.php(130): Laravel\Pulse\PulseServiceProvider::Laravel\Pulse\{closure}
#1 /vendor/laravel/framework/src/Illuminate/Foundation/Http/Kernel.php(227): Illuminate\Foundation\Http\Kernel::terminate
#0 /public/index.php(55): null

Illuminate\Database\QueryException: SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction (Connection: mysql-pulse, SQL: insert into `pulse_aggregates` (`aggregate`, `bucket`, `key`, `period`, `type`, `value`) values (count, 1703171580, users:1:roles, 60, cache_hit, 1), (count, 1703171520, users:1:roles, 360, cache_hit, 1), (count, 1703171520, users:1:roles, 1440, cache_hit, 1), (count, 1703167200, users:1:roles, 10080, cache_hit, 1) on duplicate key update `value` = `value` + values(`value`))
#16 /vendor/laravel/framework/src/Illuminate/Database/Connection.php(822): Illuminate\Database\Connection::runQueryCallback
#15 /vendor/laravel/framework/src/Illuminate/Database/Connection.php(776): Illuminate\Database\Connection::run
#14 /vendor/laravel/framework/src/Illuminate/Database/Connection.php(593): Illuminate\Database\Connection::affectingStatement
#13 /vendor/laravel/framework/src/Illuminate/Database/Query/Builder.php(3506): Illuminate\Database\Query\Builder::upsert
#12 /vendor/laravel/pulse/src/Storage/DatabaseStorage.php(166): Laravel\Pulse\Storage\DatabaseStorage::upsertCount
#11 /vendor/laravel/pulse/src/Storage/DatabaseStorage.php(72): Laravel\Pulse\Storage\DatabaseStorage::Laravel\Pulse\Storage\{closure}
#10 /vendor/laravel/framework/src/Illuminate/Collections/Traits/EnumeratesValues.php(240): Illuminate\Support\Collection::each
#9 /vendor/laravel/pulse/src/Storage/DatabaseStorage.php(72): Laravel\Pulse\Storage\DatabaseStorage::Laravel\Pulse\Storage\{closure}
#8 /vendor/laravel/framework/src/Illuminate/Database/Concerns/ManagesTransactions.php(30): Illuminate\Database\Connection::transaction
#7 /vendor/laravel/pulse/src/Storage/DatabaseStorage.php(48): Laravel\Pulse\Storage\DatabaseStorage::store
#6 /vendor/laravel/pulse/src/Ingests/StorageIngest.php(29): Laravel\Pulse\Ingests\StorageIngest::ingest
#5 /vendor/laravel/pulse/src/Pulse.php(308): Laravel\Pulse\Pulse::Laravel\Pulse\{closure}
#4 /vendor/laravel/pulse/src/Pulse.php(535): Laravel\Pulse\Pulse::rescue
#3 /vendor/laravel/pulse/src/Pulse.php(307): Laravel\Pulse\Pulse::store
#2 /vendor/laravel/pulse/src/PulseServiceProvider.php(130): Laravel\Pulse\PulseServiceProvider::Laravel\Pulse\{closure}
#1 /vendor/laravel/framework/src/Illuminate/Foundation/Http/Kernel.php(227): Illuminate\Foundation\Http\Kernel::terminate
#0 /public/index.php(55): null

Configuration

config/database.php excerpt:

'mysql-pulse' => [
    'driver' => 'mysql',
    'url' => env('PULSE_DATABASE_URL', env('DATABASE_URL')),
    'host' => env('PULSE_DB_HOST', env('DB_HOST', '127.0.0.1')),
    'port' => env('PULSE_DB_PORT', env('DB_PORT', '3306')),
    'database' => env('PULSE_DB_DATABASE', env('DB_DATABASE', 'forge')),
    'username' => env('PULSE_DB_USERNAME', env('DB_USERNAME', 'forge')),
    'password' => env('PULSE_DB_PASSWORD', env('DB_PASSWORD', '')),
    'unix_socket' => env('PULSE_DB_SOCKET', env('DB_SOCKET', '')),
    'charset' => 'utf8mb4',
    'collation' => 'utf8mb4_unicode_520_ci',
    'timezone' => '+00:00',
    'prefix' => '',
    'prefix_indexes' => true,
    'strict' => true,
    'engine' => null,
    'options' => extension_loaded('pdo_mysql') ? array_filter([
        PDO::MYSQL_ATTR_SSL_CA => env('PULSE_MYSQL_ATTR_SSL_CA', env('MYSQL_ATTR_SSL_CA')),
    ]) : [],
    'dump' => [
        'timeout' => 60,
    ],
],
config/pulse.php ``` env('PULSE_DOMAIN'), /* |-------------------------------------------------------------------------- | Pulse Path |-------------------------------------------------------------------------- | | This is the path which the Pulse dashboard will be accessible from. Feel | free to change this path to anything you'd like. Note that this won't | affect the path of the internal API that is never exposed to users. | */ 'path' => env('PULSE_PATH', 'debug/pulse'), /* |-------------------------------------------------------------------------- | Pulse Master Switch |-------------------------------------------------------------------------- | | This configuration option may be used to completely disable all Pulse | data recorders regardless of their individual configurations. This | provides a single option to quickly disable all Pulse recording. | */ 'enabled' => env('PULSE_ENABLED', true), /* |-------------------------------------------------------------------------- | Pulse Storage Driver |-------------------------------------------------------------------------- | | This configuration option determines which storage driver will be used | while storing entries from Pulse's recorders. In addition, you also | may provide any options to configure the selected storage driver. | */ 'storage' => [ 'driver' => env('PULSE_STORAGE_DRIVER', 'database'), 'database' => [ 'connection' => env('PULSE_DB_CONNECTION', null), 'chunk' => 1000, ], ], /* |-------------------------------------------------------------------------- | Pulse Ingest Driver |-------------------------------------------------------------------------- | | This configuration options determines the ingest driver that will be used | to capture entries from Pulse's recorders. Ingest drivers are great to | free up your request workers quickly by offloading the data storage. | */ 'ingest' => [ 'driver' => env('PULSE_INGEST_DRIVER', 'storage'), 'trim_lottery' => [1, 1_000], 'redis' => [ 'connection' => env('PULSE_REDIS_CONNECTION'), 'chunk' => 1000, ], ], /* |-------------------------------------------------------------------------- | Pulse Cache Driver |-------------------------------------------------------------------------- | | This configuration option determines the cache driver that will be used | for various tasks, including caching dashboard results, establishing | locks for events that should only occur on one server and signals. | */ 'cache' => env('PULSE_CACHE_DRIVER'), /* |-------------------------------------------------------------------------- | Pulse Route Middleware |-------------------------------------------------------------------------- | | These middleware will be assigned to every Pulse route, giving you the | chance to add your own middleware to this list or change any of the | existing middleware. Of course, reasonable defaults are provided. | */ 'middleware' => [ 'web', Authorize::class, ], /* |-------------------------------------------------------------------------- | Pulse Recorders |-------------------------------------------------------------------------- | | The following array lists the "recorders" that will be registered with | Pulse, along with their configuration. Recorders gather application | event data from requests and tasks to pass to your ingest driver. | */ 'recorders' => [ CacheInteractions::class => [ 'enabled' => env('PULSE_CACHE_INTERACTIONS_ENABLED', true), 'sample_rate' => env('PULSE_CACHE_INTERACTIONS_SAMPLE_RATE', .5), 'ignore' => [ '/\b[a-f0-9]{32}\b/', // Session IDs or something '/^.+@.+\|(?:(?:\d+\.\d+\.\d+\.\d+)|[0-9a-fA-F:]+)(?::timer)?$/', // Breeze / Jetstream authentication rate limiting '/^[a-zA-Z0-9]{40}$/', // Session IDs '/^health:*/', '/^illuminate:/', // Internal Laravel keys '/^laravel-health:check-.*/', '/^laravel:pulse:/', // Internal Pulse keys '/^nova/', // Internal Nova keys '/^telescope:/', // Internal Telescope keys '/^about_application/', // Internal Telescope keys ], 'groups' => [ '/^job-exceptions:.*/' => 'job-exceptions:*', '/^myclic\.content\.participant-dashboard-content\.\d+/' => 'myclic.content.participant-dashboard-content.*', '/^myclic\.feed\.\d+/' => 'myclic.feed.*', '/^throttle-event-myclic\.questionnaire\.started-\d+/' => 'throttle-event-myclic.questionnaire.started-*', '/^users:\d+:roles/' => 'users:*:roles', ], ], Exceptions::class => [ 'enabled' => env('PULSE_EXCEPTIONS_ENABLED', true), 'sample_rate' => env('PULSE_EXCEPTIONS_SAMPLE_RATE', 1), 'location' => env('PULSE_EXCEPTIONS_LOCATION', true), 'ignore' => [ // '/^Package\\\\Exceptions\\\\/', ], ], Queues::class => [ 'enabled' => env('PULSE_QUEUES_ENABLED', true), 'sample_rate' => env('PULSE_QUEUES_SAMPLE_RATE', 1), 'ignore' => [ // '/^Package\\\\Jobs\\\\/', '/^Spatie\\\\Health\\\\Jobs\\\\/', ], ], Servers::class => [ 'server_name' => env('PULSE_SERVER_NAME', gethostname()), 'directories' => explode(':', (string) env('PULSE_SERVER_DIRECTORIES', '/')), ], SlowJobs::class => [ 'enabled' => env('PULSE_SLOW_JOBS_ENABLED', true), 'sample_rate' => env('PULSE_SLOW_JOBS_SAMPLE_RATE', 1), 'threshold' => env('PULSE_SLOW_JOBS_THRESHOLD', 1000), 'ignore' => [ // '/^Package\\\\Jobs\\\\/', ], ], SlowOutgoingRequests::class => [ 'enabled' => env('PULSE_SLOW_OUTGOING_REQUESTS_ENABLED', true), 'sample_rate' => env('PULSE_SLOW_OUTGOING_REQUESTS_SAMPLE_RATE', 1), 'threshold' => env('PULSE_SLOW_OUTGOING_REQUESTS_THRESHOLD', 1000), 'ignore' => [ // '#^http://127\.0\.0\.1:13714#', // Inertia SSR ], 'groups' => [ // '#^https://api\.github\.com/repos/.*$#' => 'api.github.com/repos/*', // '#^https?://([^/]*).*$#' => '\1', // '#/\d+#' => '/*', ], ], SlowQueries::class => [ 'enabled' => env('PULSE_SLOW_QUERIES_ENABLED', true), 'sample_rate' => env('PULSE_SLOW_QUERIES_SAMPLE_RATE', 1), 'threshold' => env('PULSE_SLOW_QUERIES_THRESHOLD', 1000), 'location' => env('PULSE_SLOW_QUERIES_LOCATION', true), 'ignore' => [ '/(["`])pulse_[\w]+?\1/', ], ], SlowRequests::class => [ 'enabled' => env('PULSE_SLOW_REQUESTS_ENABLED', true), 'sample_rate' => env('PULSE_SLOW_REQUESTS_SAMPLE_RATE', 1), 'threshold' => env('PULSE_SLOW_REQUESTS_THRESHOLD', 1000), 'ignore' => [ '#^/_debugbar#', '#^/debug/health#', '#^/debug/horizon#', '#^/debug/pulse$#', '#^/livewire/update#', ], ], UserJobs::class => [ 'enabled' => env('PULSE_USER_JOBS_ENABLED', true), 'sample_rate' => env('PULSE_USER_JOBS_SAMPLE_RATE', 1), 'ignore' => [ // '/^Package\\\\Jobs\\\\/', ], ], UserRequests::class => [ 'enabled' => env('PULSE_USER_REQUESTS_ENABLED', true), 'sample_rate' => env('PULSE_USER_REQUESTS_SAMPLE_RATE', 1), 'ignore' => [ '#^/debug/pulse$#', '#^/debug/horizon#', '#^/debug/health$#', ], ], /* * Third-party */ AppsLoadRecorder::class => [ 'enabled' => env('PULSE_APPS_LOAD_ENABLED', true), 'sample_rate' => env('PULSE_APPS_LOAD_SAMPLE_RATE', 1), 'limit' => env('PULSE_APPS_LOAD_LIMIT', 10), 'ignore' => [ '#^/pulse$#', ], ], FourXxRecorder::class => [ 'enabled' => env('PULSE_4XX_ENABLED', true), 'sample_rate' => env('PULSE_4XX_SAMPLE_RATE', 1), 'ignore' => [ // '#^/wp-admin#', ], ], DiskRecorder::class => [ 'enabled' => env('GEOW_DISK_METRICS', true), ], DatabaseRecorder::class => [ 'connections' => [ 'mysql' => [ 'values' => [ 'Connections', 'Threads_connected', 'Threads_running', 'Max_used_connections', ], 'aggregates' => [ 'avg' => [ 'Threads_connected', 'Threads_running', ], 'max' => [], 'count' => [], ], ], ], ], CommandHistoryRecorder::class => [ 'enabled' => env('GEOW_COMMAND_HISTORY', true), 'ignore' => [ '#^backup:monitor#', '#^cache:prune-stale-tags#', '#^config:cache#', '#^config:clear#', '#^event:cache#', '#^event:clear#', '#^health:check#', '#^health:queue-check-heartbeat#', '#^health:schedule-check-heartbeat#', '#^horizon:pause#', '#^horizon:purge#', '#^horizon:snapshot#', '#^horizon:status#', '#^horizon:work#', '#^icons:cache#', '#^icons:clear#', '#^migrate#', '#^modules:cache#', '#^modules:clear#', '#^operations:process#', '#^package:discover#', '#^pulse:check#', '#^pulse:restart#', '#^purifier:clear#', '#^route:cache#', '#^route:clear#', '#^scout:flush#', '#^scout:import#', '#^storage:link#', '#^view:cache#', '#^view:clear#', ], ], ], ]; ```

Steps To Reproduce

Can't really provide a reproducible repository here since it's about a database deadlock issue, but let me know how else I can provide information.

What we did:

jessarcher commented 8 months ago

Hi @sebastiaanluca,

I haven't seen this one before with Pulse. You've already mentioned that the issue existed before the custom recorders, but I've looked through their source code anyway and can't see any obvious problems.

I'm trying to nail down what circumstances would cause a deadlock. In the error you've included, the rows being upserted include a specific user ID (users:1:roles), so the chances for a deadlock on those rows should be even smaller than most other data Pulse collects, but perhaps that's a red herring.

A few questions:

It's also worth mentioning that using a Redis ingest would likely prevent this issue, but I'd still like to get to the bottom of the problem if you're able to help debug it.

sebastiaanluca commented 8 months ago

Thanks for the quick and detailed response! IIRC we also had this with Telescope when we used it in production (now only locally). And I'm not sure if we're missing data because of it, but not that big of a deal.

Are you able to provide a few other occurrences of the error? I'm curious to see more examples of the data being upserted (just be sure not to share any sensitive data).

So far we had 14 of these exceptions between December 21st and January 8th. I've checked all of them individually and these are the unique ones. So almost all of them are related to the third-party 4xx recorder, cache recorder, and slow request.

Illuminate\Database\QueryException
SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction (Connection: mysql-pulse, SQL: insert into `pulse_aggregates` (`aggregate`, `bucket`, `key`, `period`, `type`, `value`) values (count, 1703171580, users:1:roles, 60, cache_hit, 1), (count, 1703171520, users:1:roles, 360, cache_hit, 1), (count, 1703171520, users:1:roles, 1440, cache_hit, 1), (count, 1703167200, users:1:roles, 10080, cache_hit, 1) on duplicate key update `value` = `value` + values(`value`))

SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction (Connection: mysql-pulse, SQL: insert into `pulse_aggregates` (`aggregate`, `bucket`, `key`, `period`, `type`, `value`) values (count, 1704209940, ["GET","fonts\/SohneBreit-Halbfett.woff2",404], 60, 4xx_request, 1), (count, 1704209760, ["GET","fonts\/SohneBreit-Halbfett.woff2",404], 360, 4xx_request, 1), (count, 1704209760, ["GET","fonts\/SohneBreit-Halbfett.woff2",404], 1440, 4xx_request, 1), (count, 1704205440, ["GET","fonts\/SohneBreit-Halbfett.woff2",404], 10080, 4xx_request, 1) on duplicate key update `value` = `value` + values(`value`))

SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction (Connection: mysql-pulse, SQL: insert into `pulse_aggregates` (`aggregate`, `bucket`, `key`, `period`, `type`, `value`) values (count, 1704374460, ["GET","build\/assets\/location-1d749d6a.js",404], 60, 4xx_request, 1), (count, 1704374280, ["GET","build\/assets\/location-1d749d6a.js",404], 360, 4xx_request, 1), (count, 1704373920, ["GET","build\/assets\/location-1d749d6a.js",404], 1440, 4xx_request, 1), (count, 1704366720, ["GET","build\/assets\/location-1d749d6a.js",404], 10080, 4xx_request, 1) on duplicate key update `value` = `value` + values(`value`))

SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction (Connection: mysql-pulse, SQL: insert into `pulse_aggregates` (`aggregate`, `bucket`, `key`, `period`, `type`, `value`) values (count, 1704705360, ["GET","build\/assets\/Components-1xVWUxt-.js",404], 60, 4xx_request, 1), (count, 1704705120, ["GET","build\/assets\/Components-1xVWUxt-.js",404], 360, 4xx_request, 1), (count, 1704705120, ["GET","build\/assets\/Components-1xVWUxt-.js",404], 1440, 4xx_request, 1), (count, 1704699360, ["GET","build\/assets\/Components-1xVWUxt-.js",404], 10080, 4xx_request, 1) on duplicate key update `value` = `value` + values(`value`))

SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction (Connection: mysql-pulse, SQL: insert into `pulse_aggregates` (`aggregate`, `bucket`, `key`, `period`, `type`, `value`) values (count, 1704643200, ["GET","\/redactedroutes\/{redactedroute}\/unsubscribe","redactednamespace"], 60, slow_request, 1), (count, 1704643200, ["GET","\/redactedroutes\/{redactedroute}\/unsubscribe","redactednamespace"], 360, slow_request, 1), (count, 1704643200, ["GET","\/redactedroutes\/{redactedroute}\/unsubscribe","redactednamespace"], 1440, slow_request, 1), (count, 1704638880, ["GET","\/redactedroutes\/{redactedroute}\/unsubscribe","redactednamespace"], 10080, slow_request, 1), (count, 1704643200, 6553, 60, slow_user_request, 1), (count, 1704643200, 6553, 360, slow_user_request, 1), (count, 1704643200, 6553, 1440, slow_user_request, 1), (count, 1704638880, 6553, 10080, slow_user_request, 1), (count, 1704643200, 6553, 60, user_request, 1), (count, 1704643200, 6553, 360, user_request, 1), (count, 1704643200, 6553, 1440, user_request, 1), (count, 1704638880, 6553, 10080, user_request, 1), (count, 1704643200, users:*:roles, 60, cache_hit, 2), (count, 1704643200, users:*:roles, 360, cache_hit, 2), (count, 1704643200, users:*:roles, 1440, cache_hit, 2), (count, 1704638880, users:*:roles, 10080, cache_hit, 2), (count, 1704643200, ["update `user_redactedroutes` set `status` = ?, `user_redactedroutes`.`updated_at` = ? where `id` = ?","redactednamespace:50"], 60, slow_query, 1), (count, 1704643200, ["update `user_redactedroutes` set `status` = ?, `user_redactedroutes`.`updated_at` = ? where `id` = ?","redactednamespace:50"], 360, slow_query, 1), (count, 1704643200, ["update `user_redactedroutes` set `status` = ?, `user_redactedroutes`.`updated_at` = ? where `id` = ?","redactednamespace:50"], 1440, slow_query, 1), (count, 1704638880, ["update `user_redactedroutes` set `status` = ?, `user_redactedroutes`.`updated_at` = ? where `id` = ?","redactednamespace:50"], 10080, slow_query, 1), (count, 1704643200, redis:default, 60, queued, 2), (count, 1704643200, redis:default, 360, queued, 2), (count, 1704643200, redis:default, 1440, queued, 2), (count, 1704638880, redis:default, 10080, queued, 2), (count, 1704643200, 6553, 60, user_job, 2), (count, 1704643200, 6553, 360, user_job, 2), (count, 1704643200, 6553, 1440, user_job, 2), (count, 1704638880, 6553, 10080, user_job, 2) on duplicate key update `value` = `value` + values(`value`))

Given the key in the example references a user ID of "1" (I assume an admin), I'm wondering whether the error only occurs while the Pulse dashboard is open and polling for data?

1 is indeed the super admin, first user. It could be it only occurs when the dashboard is open and polling. Given the exception times are usually during the day around the same time, grouped.

Screenshot 2024-01-09 at 13 16 12

I'm also curious about what's happening in your Pulse auth gate, especially given the mention of "roles" in the error message.

Auth handler:

Gate::define(Abilities::ACCESS_DEBUG_SYSTEMS, static function (User $user): bool {
    return $user->isA(Roles::SUPER_ADMIN);
});

Gate::define('viewPulse', static function (User $user): bool {
    return $user->can(Abilities::ACCESS_DEBUG_SYSTEMS);
});

The exception handler just FYI:

Pulse::handleExceptionsUsing(static function (Throwable $exception): void {
    report($exception);
});

Roughly how many requests does your application receive, and could there be a lot of concurrent requests? For example, is it an SPA that fires off multiple concurrent API requests to your backend?

Most of our app is still built with Blade, some newer parts with Inertia. We don't have a lot of visitors and there's not that much load on the server too. I'll have to get back to you with the detailed numbers, that's something for management to look into which could take a bit.

How many rows are in your pulse_aggregates table?

Only 82225 at the moment

Can you share the results of running the SQL query: SHOW CREATE TABLE pulse_aggregates - I'm curious to check if you have the latest indexes, which should reduce the scope of locked rows.

CREATE TABLE `pulse_aggregates` (
  `id` bigint unsigned NOT NULL AUTO_INCREMENT,
  `bucket` int unsigned NOT NULL,
  `period` mediumint unsigned NOT NULL,
  `type` varchar(191) COLLATE utf8mb4_unicode_520_ci NOT NULL,
  `key` text COLLATE utf8mb4_unicode_520_ci NOT NULL,
  `key_hash` binary(16) GENERATED ALWAYS AS (unhex(md5(`key`))) VIRTUAL,
  `aggregate` varchar(191) COLLATE utf8mb4_unicode_520_ci NOT NULL,
  `value` decimal(20,2) NOT NULL,
  `count` int unsigned DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `pulse_aggregates_bucket_period_type_aggregate_key_hash_unique` (`bucket`,`period`,`type`,`aggregate`,`key_hash`),
  KEY `pulse_aggregates_period_bucket_index` (`period`,`bucket`),
  KEY `pulse_aggregates_type_index` (`type`),
  KEY `pulse_aggregates_period_type_aggregate_bucket_index` (`period`,`type`,`aggregate`,`bucket`)
) ENGINE=InnoDB AUTO_INCREMENT=8489080 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_520_ci

Can you also please run SHOW ENGINE INNODB STATUS and look for the LATEST DETECTED DEADLOCK section and share that (again, making sure not to share anything sensitive).

Screenshot 2024-01-09 at 13 25 59

😬

It's also worth mentioning that using a Redis ingest would likely prevent this issue, but I'd still like to get to the bottom of the problem if you're able to help debug it.

We'll definitely look into that when we keep getting more of these, but for the moment it's not a real blocking issue if we miss a few records.

jessarcher commented 8 months ago

Thanks, @sebastiaanluca! Everything looks fine there.

@timacdonald and I have replicated this error locally by running multiple concurrent processes, all repeatedly capturing the same type+key combination (i.e. writing to the same row) in a while (true) loop. It seems only to happen at the start of a new "bucket" (i.e. when the upsert will be doing an insert rather than an update), and it takes quite a few concurrent processes to trigger it. The speed of my computer might be making it harder to replicate than on a typical web+db server.

As previously mentioned, the Redis ingest should solve this by removing concurrent upserts, but we're still looking to see if there's anything else we can do to solve or reduce the issue. As you've mentioned, the only real problem is missing records as Pulse catches the exception.

jessarcher commented 8 months ago

271 should help with this. Once it's merged, tagged, and you've updated your version - if you're still getting the error, please let us know.

sebastiaanluca commented 8 months ago

Wow, can't believe y'all figured that out so fast! We don't have that many visitors on our 4GB 2vCPU DO server (<1000 a day), so odd that it was a concurrency issue.

Eagerly awaiting a release, will report back then. Appreciated ✌️

sebastiaanluca commented 7 months ago

Haven't had the exception anymore since we upgraded to beta11. Great work and again, thanks!

jessarcher commented 7 months ago

Great to hear! Thanks for checking in!