Occasionally receiving the wrong record from Redis - Githubissues

phpredis / phpredis

A PHP extension for Redis

Other

9.99k stars 2.14k forks source link

Occasionally receiving the wrong record from Redis #1818

Open FryJay opened 4 years ago

FryJay commented 4 years ago

Expected behaviour

The correct record is retrieved from Redis

Actual behaviour

On some occasions at sufficient scale (9-10k requests per second), the wrong record is being retrieved from Redis. This issue first appeared when we updated our phpredis driver from 5.0.2 to 5.1.1 (we've since updated to 5.2.2). We have had no luck reproducing the issue outside of our production environment so we believe it to only happen at a sufficient scale that our non-production environments cannot reproduce.

I'm seeing this behaviour on

OS: CentOS 7
Redis: AWS ElastiCache with Clustered Redis engine 4.0.10
PHP: 7.4
phpredis: 5.2.2

Steps to reproduce, backtrace or example script

We have had no luck as of yet reproducing the issue outside of our production environment.

I've checked

[x] There is no similar issue from other users
[x] Issue isn't fixed in develop branch

FryJay commented 4 years ago

Other information that might be relevant:

We use persistent connections
We had seen the issue once before when upgrading to a 4.x version of the driver and it went away when updated to 5.0.2. I'm trying to track down what specific versions we were using in the first upgrade that went from working to not working.

miksir commented 4 years ago

Same issue. Redis 5.3.1 Php 7.2.33, pconnect used

We got issue #1757 and we have added password to options for fix it. Right after that we got same troubles.

yatsukhnenko commented 4 years ago

@miksir you can try to set auth parameter in pconnect and not using separated auth method?

miksir commented 4 years ago

@yatsukhnenko Two days ago we moved auth parameter to pconnect (['auth' => $password]) and removed auth command. Because of AUTH bug.

After this release we started to receive complains from our devs about "redis returns not that we set". This issue was confirmed - sometimes GET returns garbage (probably values of other keys). This release was reverted and our SRE investigating it now.

yatsukhnenko commented 4 years ago

@miksir Have you disabled redis.pconnect.echo_check_liveness in your ini settings?

miksir commented 4 years ago

redis.pconnect.echo_check_liveness enabled

We will try echo_check_liveness, probably it can help to us even with auth command because we have reconnection logic in php. Thanks

miksir commented 4 years ago

@yatsukhnenko just checked - we don't use pooling_enabled, so echo_check_liveness shouldn't have an effect

yatsukhnenko commented 4 years ago

@miksir then you should use pooling_enabled. It was made to fix issues like yours :smile:

yatsukhnenko commented 4 years ago

@FryJay @miksir did you try to turn on pooling_enabled and echo_check_liveness INI options?

miksir commented 4 years ago

Sorry for delay, currently we can't turn on pooling_enabled because all out team busy moving to redis cluster and did not use persist :) Is pooling_enabled affects connections to cluster too?

Novynn commented 3 years ago

We had the same issue with an older version of PHP Redis using FPMs in a production environment. Upgrading to 5.3.2 with redis.pconnect.echo_check_liveness and redis.pconnect.pooling_enabled as their defaults (enabled) seems to have fixed the problem.

ebogdanov commented 3 years ago

We're facing same issue, and it happens when requests to Redis cluster is increasing. I.e. we've stable situation (by Grafana's charts), but if we've peak - we can receive wrong data from client.

PHP 7.2.33-1+ubuntu20.04.1+deb.sury.org+1

redis.pconnect.echo_check_liveness => 1 => 1
redis.pconnect.pooling_enabled => 1 => 1

persistent option in RedisCluster constructor is set to true

But if I disable read from slaves in any manner (with RedisCluster::FAILOVER_NONE) problem gones away and never happens.

Problem is that our current setup is limited in budget and we can't allow interaction with masters only, so that we do balancing with read from slaves.

I can't find issue just in code :( If you're too busy to lookup, can you point me how to create local developer enviorionment, please?

So that I will try to debug code step by step, to get idea of it's workflow.

Thanks in advance.

ebogdanov commented 3 years ago

Well, I've found the cause of the issue. It is "redis.clusters.cache_slots" option. If it set to 1, then liveness checks are not performed at all.

@michael-grunder Can you confirm that this is by design, please?