toptal / chewy

High-level Elasticsearch Ruby framework based on the official elasticsearch-ruby client
MIT License
1.88k stars 364 forks source link

[fix] delayed_sidekiq race condition #937

Closed skcc321 closed 4 months ago

skcc321 commented 4 months ago

In the PR I'm addressing two issues with delayed_sidekiq strategy.

  1. There is a chance for race conditions exactly in this place.

            unless redis.zrank(timechunks_key, timechunk_key) # read
              redis.zadd(timechunks_key, at, timechunk_key) # write

    I eliminated it using Lua script and making the combination of operations atomic.

  2. Also, there was a possibility to bloat Redis if there is an issue with reindexing (ES connection timeout for instance) Before submitting the PR make sure the following are checked:

konalegi commented 4 months ago

please ping me once you are ready to merge. I'll release it

konalegi commented 4 months ago

Please fix rubocop issues, and something is odd with our specs and CI in general :/

konalegi commented 4 months ago

Please take a look at your PR to see what's wrong. I created a separate PR to check CI and it seems to be working. https://github.com/toptal/chewy/pull/942

ebeagusamuel commented 4 months ago

@skcc321 Please try rebasing on toptal/chewy master branch.

skcc321 commented 4 months ago

Sorry for the delay, guys. Yep, I rebased it on upstream/master today. So, we've been testing it on prod 2 weeks. Looks like it completely resolved the race condition issue. @konalegi, @ebeagusamuel should be ready to merge now. Could you guys restart the CI to verify the PR?

konalegi commented 4 months ago

Since you've started using real redis in specs, you have to redis service, smth like and provide proper env variables for sidekiq/redis depending how you want to test it.

jobs:
  ruby-3:
    services:    
      redis:
        # Docker Hub image
        image: redis
        # Set health checks to wait until redis has started
        options: >-
          --health-cmd "redis-cli ping"
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5
skcc321 commented 4 months ago

Since you've started using real redis in specs, you have to redis service, smth like and provide proper env variables for sidekiq/redis depending how you want to test it.

jobs:
  ruby-3:
    services:    
      redis:
        # Docker Hub image
        image: redis
        # Set health checks to wait until redis has started
        options: >-
          --health-cmd "redis-cli ping"
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5

done. let's try again the CI run.

konalegi commented 4 months ago

@skcc321 still failing :/

skcc321 commented 4 months ago

@skcc321 still failing :/

OK, I'll take a look tonight.

skcc321 commented 4 months ago

@konalegi one more attempt please

skcc321 commented 4 months ago

@konalegi looks like two failed checks are not related to the changes I made. could you restart them?

skcc321 commented 4 months ago

@konalegi one of them has passed. Could you restart the last one?

konalegi commented 4 months ago

Great, merging!