pulibrary / bibdata

Local API for retrieving bibliographic and other useful data from Alma (Ruby 3.1.0, Rails 7.1.3.4)
BSD 2-Clause "Simplified" License
16 stars 7 forks source link

Sidekiq jobs are getting run twice, leading to database locks #2333

Closed maxkadel closed 3 months ago

maxkadel commented 4 months ago

For more detailed notes, see the datadog notebook

Ensure Sidekiq jobs are not created twice with the same job id. This is very likely a redis latency issue.

History

We have a monitoring alert that is going off repeatedly for bibdata, saying that postgres queries are taking a very long time - sometimes as much as 15 minutes (at which point they're probably getting killed by Postgres, not finishing).

The postgres queries that are taking so long are:

SELECT dump_types . * FROM dump_types WHERE dump_types . constant = ? LIMIT ?

Which seem to be getting called from the AlmaDumpTransferJob.

It seems possible that this job is somehow getting called twice with different GlobalIDs, and causing a database lock? If that's it, it could be a redis latency issue? Or a sidekiq thread management issue? (again, see datadog notebook)

Acceptance Criteria

maxkadel commented 4 months ago

Sidekiq thread on migrating to a new Redis suggests using redis replication

To migrate, you set up a new replica of the old primary, let it replicate, shut down workers, shut down primary, promote new replica to new primary, start up workers.

Beck-Davis commented 4 months ago
hackartisan commented 4 months ago

Please make sure to use the central redis var described in https://github.com/pulibrary/princeton_ansible/blob/main/roles/redis/README.md

maxkadel commented 4 months ago

Thanks @hackartisan! I had missed this.

maxkadel commented 4 months ago

Is Sidekiq connecting to Redis twice?

NFO: Sidekiq Pro 7.2.0, commercially licensed. Thanks for your support!
Apr 08 06:27:53 bibdata-alma-worker-staging1 sidekiq[865]: 2024-04-08T10:27:53.704Z pid=865 tid=49l INFO: Sidekiq 7.2.2 connecting to Redis with options {:size=>10, :pool_name=>"internal", :url=>"redis://lib-redis-staging1.princeton.ed>
Apr 08 06:27:53 bibdata-alma-worker-staging1 sidekiq[865]: 2024-04-08T10:27:53.714Z pid=865 tid=49l INFO: Sidekiq 7.2.2 connecting to Redis with options {:size=>2, :pool_name=>"default", :url=>"redis://lib-redis-staging1.princeton.edu:>
maxkadel commented 4 months ago

The redis production server is still on Redis 6.0, which is too old for our current version of Sidekiq. See Princeton Ansible ticket to upgrade this server

kevinreiss commented 4 months ago

We should talk to ops on how to transition the production environment to the same version as staging or discuss alternative plans.

maxkadel commented 4 months ago

This may be related to https://github.com/pulibrary/bibdata/issues/1959. Newer Honeybadger error. May be addressed by postgres configuration change