valkey-io / valkey-glide

An open source Valkey client library that supports Valkey and Redis open source 6.2, 7.0 and 7.2. Valkey GLIDE is designed for reliability, optimized performance, and high-availability, for Valkey and Redis OSS based applications. GLIDE is a multi language client library, written in Rust with programming language bindings, such as Java and Python
Apache License 2.0
239 stars 53 forks source link

python: cluster-require-full-coverage no - scan fail #2436

Open raphaelauv opened 2 hours ago

raphaelauv commented 2 hours ago

Describe the bug

When I scan a redis cluster with a missing node , the scan do not work even if the redis cluster is setup with

cluster-require-full-coverage no

Expected Behavior

with cluster-require-full-coverage no a redis scan should work with a missing node , like this ->

image

Current Behavior

scan fail with a log on the missing node

  File "/usr/lib/python3.11/asyncio/base_events.py", line 654, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/home/raphael/REPO/xx/yy/src/redis_clean.py", line 42, in main2
    cursor, keys = await client.scan(cursor,match=match_regex, count=100)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/raphael/REPO/xx/yy/venv/lib/python3.11/site-packages/glide/async_commands/cluster_commands.py", line 1139, in scan
    await self._cluster_scan(cursor, match, count, type),
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/raphael/REPO/xx/yy/venv/lib/python3.11/site-packages/glide/glide_client.py", line 562, in _cluster_scan
    response = await self._write_request_await_response(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/raphael/REPO/xx/yy/venv/lib/python3.11/site-packages/glide/glide_client.py", line 446, in _write_request_await_response
    await response_future
glide.exceptions.ConnectionError: Received connection error `Requested connection not found - ConnectionNotFoundForRoute: 172.20.0.2:6379`. Will attempt to reconnect

Reproduction Steps

image

import asyncio
import logging
from logging import getLogger
from glide import GlideClusterClientConfiguration, NodeAddress, GlideClusterClient, ClusterScanCursor

logger = getLogger(__name__)
logging.basicConfig(level=logging.INFO)

async def main2():
    addresses = [NodeAddress("localhost", 9379)]
    config = GlideClusterClientConfiguration(addresses)
    client = await GlideClusterClient.create(config)
    cursor = ClusterScanCursor()
    all_keys = []
    while not cursor.is_finished():
        cursor, keys = await client.scan(cursor,match="{toto}*", count=100)
        all_keys.extend(keys)
    print(all_keys)

if __name__ == '__main__':
    asyncio.run(main2())

Possible Solution

No response

Additional Information/Context

No response

Client version used

1.1.0

Engine type and version

8.0.1

OS

ubuntu 22.04

Language

Python

Language Version

3.11

Cluster information

No response

Logs

No response

Other information

No response

avifenesh commented 2 hours ago

@raphaelauv Hi, We discussed it when creating the cluster scan, which is not usual scan. Can you explain your use-case? You happened to have a wrong state of the cluster and wanted to keep scanning or is it a state you aware of ahead?

Less relevant, but the improvement of scan with match can be done (internally by Valkey) per node, but not per cluster. At this point there's no built in functionality of wide scanning of a cluster.

raphaelauv commented 2 hours ago

hi, thanks @avifenesh

we want our scan to work even with missing nodes (we have big cluster(s) and a missing node is a common thing)

would be great that we can explicitly use scan with a cluster-require-full-coverage at false

avifenesh commented 2 hours ago

we want our scan to work even with missing nodes (we have big cluster(s) and a missing node is a common thing)

The challenge with missing nodes and cluster scan is that in such cases we can't provide any guarantee about the validity of the scan.

Since currently there's no cluster wide scan provided by Valkey, we need to create some complex logic to provide this kind of scan with the guarantees of scan. The way to do that is to track the covered slots. In case a cluster has missing slots, there's no way we can validate that the scan is over and everything is covered.

The solution in such a case would probably be to iterate blindly over the cluster nodes and to ignore nodes that have connection issues, without giving guarantees, and users will need to choose this scan type ahead with an optional flag or so.

In your case, you would prefer to use a scan with no guarantees over a scan with guarantees which can't scan a not covered cluster?

I'm just trying to get the picture and to understand the needs better.

If you are fine with sharing this information, i would also like to hear why a case of missing nodes (a full shard, in case you have replicas) is common. I have some experience, and missing shards are not supposed to be common. If you prefer to take the specific privately, we can chat in Valkey discord, and you can approach me there in private as well.

raphaelauv commented 1 hour ago
The solution in such a case would probably be to iterate blindly over the cluster nodes and to ignore nodes that have connection issues, without giving guarantees, and users will need to choose this scan type ahead with an optional flag or so.

yes this is the missing feature