scylladb / python-driver

ScyllaDB Python Driver, originally DataStax Python Driver for Apache Cassandra
https://python-driver.docs.scylladb.com
Apache License 2.0
70 stars 42 forks source link

pool: log error when failed to connect to shard #276

Closed muzarski closed 6 months ago

muzarski commented 9 months ago

This change is related to https://github.com/scylladb/python-driver/issues/275. Silently ignoring the connection failure can be misleading, which was the case in the issue provided above.

This PR adds an error log when driver fails to open the CQL connection to specific shard.

fruch commented 9 months ago

@muzarski let add an integration test with the relevant case (in the issue you have most of what needed)

also I'm not sure about solution here, it's gonna hide any failure from the connection method

muzarski commented 9 months ago

@fruch but the main issue is still not fixed. Is there a point to add an integration test?

fruch commented 9 months ago

@fruch but the main issue is still not fixed. Is there a point to add an integration test?

I was more concerned with the behavior change you introduced.

anyhow the log doesn't really help in directing a use to what is the issue, and what he can do to handle it.

muzarski commented 9 months ago

I was more concerned with the behavior change you introduced.

Yeah, I forgot to reraise the exception - fixed it already.

anyhow the log doesn't really help in directing a use to what is the issue, and what he can do to handle it.

I'm not sure I understand correctly.

As for me, I wish I'd seen this kind of log message when I had debugged the issue. During the SCT run, there was no information regarding the shard-aware CQL connection failure - nothing suggested that this was the real issue.

fruch commented 8 months ago

@Lorak-mmk can you review this one ?

avelanarius commented 6 months ago

@muzarski Please rebase on top of the current master.

Lorak-mmk commented 6 months ago

One thing I'd like to know about this change that I won't be able to understand from code changes: What happens with the exception? Why isn't it logged? Could you describe that in commit message?

muzarski commented 6 months ago

One thing I'd like to know about this change that I won't be able to understand from code changes: What happens with the exception? Why isn't it logged? Could you describe that in commit message?

In the original issue we encountered during SCT run, the error was silently dropped in the HostConnection::_open_connections_for_all_shards:

for shard_id in range(self.host.sharding_info.shards_count):
    if skip_shard_id is not None and skip_shard_id == shard_id:
        continue
    future = self._session.submit(self._open_connection_to_missing_shard, shard_id)
    if isinstance(future, Future):
        self._connecting.add(shard_id)
        self._shard_connections_futures.append(future)

The HostConnection::_open_connection_to_missing_shard function is submitted to the ThreadPoolExecutor which calls this function asynchronously, and sets the exception on the returned future.

I believe the reason that this exception was ignored, is the fact that we don't check what the future holds (either result or exception) after execution. We can see that the resulting future is appended to HostConnection::_shard_connections_futures, but from what I see in the code, this list is only used in the HostConnection::shutdown method to cancel all of the futures contained in the list.

I'll add a short description in the commit message.

muzarski commented 6 months ago

v2: