leandromoreira / redlock-rb

Redlock is a redis-based distributed lock implementation in Ruby. More than 20M downloads.
BSD 2-Clause "Simplified" License
688 stars 80 forks source link

RedisClient::CommandError ERR value is not an integer or out of range script #139

Open smathieu opened 1 year ago

smathieu commented 1 year ago

Redlock versio: 2.0.2 Ruby: 3.2.2 Redis: 7.0.11 Redis-client: 0.14.1

We're getting this intermittent error in production at scale. It seems like we can sometime get a lock, but we get this obscure Redis error at times.

RedisClient::CommandError: ERR value is not an integer or out of range script: ceb2b2062e40c51a2b3963fd078bc71f11bdc65c, on @user_script:2. (RedisClient::CommandError)

Ruby code exercising this:

module Faraday
  class RetryAfter < Faraday::Middleware
    PADDING = 1.1

    attr_reader :lock_manager

    def initialize(app, redis_urls:, key: "default", statsd: nil)
      super(app)
      @app = app
      @key = key
      @lock_manager = Redlock::Client.new(Array(redis_urls))
      @statsd = statsd
    end

    def call(env)
      lock_key = "faraday-retry-after-#{@key}"

      if @lock_manager.locked?(lock_key)
        @statsd&.increment("faraday.retry-after.locked", tags: ["key:#{@key}"])

        ms = @lock_manager.get_remaining_ttl_for_resource(lock_key) * PADDING
        sleep ms / 1000.0 if ms&.positive?
      end

      response = @app.call(env)

      if response.status == 429 && response.headers["Retry-After"].present?
        @statsd&.increment("faraday.retry-after.retry_after", tags: ["key:#{@key}"])
        seconds = response.headers["Retry-After"].to_i * PADDING
        @lock_manager.lock(lock_key, seconds * 1000)
        sleep seconds
        response = @app.call(env)
      end

      response
    end

    private

    def sleep(seconds)
      @statsd&.histogram("faraday.retry-after.kernel_sleep", seconds, tags: ["key:#{@key}"])
      Kernel.sleep(seconds)
    end
  end
end

We're also seeing Redlock::LockAcquisitionError at times when calling @lock_manager.lock(lock_key, seconds * 1000). Inspecting the errors attribute of this exception also reveals the same ERR value is not an integer or out of range script error.

smathieu commented 1 year ago

This was because seconds * 1000 can return a Float rather than an Integer. Would be nice to get an ArgumentError if the input is the wrong type.