Sharded cluster connection string with invalid hosts causes impossible connection

H-Max commented 3 years ago

Bug Report

When using a list of mongos processes in the connection string, connection fails if the first host is not available instead of failing over to other hosts.

Environment

PHP 7.4 in docker container Mongodb 4.4.1 sharded cluster (5 nodes, each a complete PSS replica-set) with 8+ mongos processes.

libbson bundled version => 1.17.0 libmongoc bundled version => 1.17.0 libmongoc SSL => enabled libmongoc SSL library => OpenSSL libmongoc crypto => enabled libmongoc crypto library => libcrypto libmongoc crypto system profile => disabled libmongoc SASL => enabled libmongoc ICU => disabled libmongoc compression => enabled libmongoc compression snappy => disabled libmongoc compression zlib => enabled libmongoc compression zstd => disabled libmongocrypt bundled version => 1.0.4 libmongocrypt crypto => enabled libmongocrypt crypto library => libcrypto mongodb.debug => no value => no value

Composer: name : mongodb/mongodb descrip. : MongoDB driver library keywords : database, driver, mongodb, persistence versions : * 1.7.0

Test Script

Initiate a client like that:

$client = new \MongoDB\Client("host1:33001,host2:33001,host3:33001");

Expected and Actual Behavior

The expected behavior, as described here is that if ever any of the hosts is down, it should try to connect to other ones. But it's not.

Instead, the connection fails as soon as we try a read with an error like that if host2 and host3 are down / unreachable (and host1 is OK) :

MongoDB\Driver\Exception\ConnectionTimeoutException: No suitable servers found (`serverSelectionTryOnce` set): [connection refused calling ismaster on 'host2:33001'] [connection refused calling ismaster on 'host3:33001']
#0 /apps/vendor/api/mongodb/mongodb/src/functions.php(431): MongoDB\Driver\Manager->selectServer(Object(MongoDB\Driver\ReadPreference))
#1 /apps/vendor/api/mongodb/mongodb/src/Collection.php(684): MongoDB\select_server(Object(MongoDB\Driver\Manager), Array)

When trying with the mongo shell with this command, it works and connects to the mongos correctly:

> mongo mongodb://host1:33001,host2:33001,host3:33001
MongoDB shell version v4.4.1
connecting to: mongodb://host1:33011,host2:33001,host3:33001/?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("f55b5dad-32cd-44d6-9a0a-09ac00e9e0e4") }
MongoDB server version: 4.4.1
mongos>

I lookup at the serverSelectionTryOnce connection setting, but it does work and the documentation says: Multi-threaded drivers and mongos do not support serverSelectionTryOnce.

Can you please advise on that matter ? For now, I must explicitly specify only 1 mongos and this is a real issue.

alcaeus commented 3 years ago

Hi @H-Max! First of all, sorry for the delay in getting back to you. I played around with this a couple of times but never got around to finishing my write-up.

I tried to reproduce this locally, where I have a sharded cluster with two mongos instances listening on ports 4430 and 4431. I created a MongoDB\Client instance with an additional (unreachable) host in the first position:

$cluster = 'mongodb://localhost:4429,localhost:4430,localhost:4431';

$client = new MongoDB\Client($cluster, ['serverSelectionTimeoutMS' => 2000]);

for ($i = 0; $i < 5; $i++) {
    echo "ping...";
    $result = $client->selectDatabase('test')->command(['ping' => 1]);
    echo "pong: " . $result->getServer()->getPort() . "\n";
}

The output from this script is as follows:

ping...pong: 4430
ping...pong: 4431
ping...pong: 4430
ping...pong: 4431
ping...pong: 4430

Dumping the list of servers (in this case their host and port only) shows only two servers:

array(2) {
  [0] =>
  string(14) "localhost:4430"
  [1] =>
  string(14) "localhost:4431"
}

I was also able to confirm that libmongoc (which handles connecting to servers) correctly skips the unknown server by setting the mongodb.debug setting to log to stdout. Among the entire trace output we can see this:

[2020-11-10T14:54:53.584899+00:00]     socket: TRACE   > ENTRY: _mongoc_socket_wait():161
[2020-11-10T14:54:53.584927+00:00]     socket: TRACE   >  EXIT: _mongoc_socket_wait():226
[2020-11-10T14:54:53.584934+00:00]     socket: TRACE   >  EXIT: mongoc_socket_connect():890
[2020-11-10T14:54:53.584939+00:00]     stream: TRACE   > ENTRY: _mongoc_stream_socket_poll():227
[2020-11-10T14:54:53.584980+00:00]     socket: TRACE   > ENTRY: mongoc_socket_poll():297
[2020-11-10T14:54:53.585008+00:00]     stream: TRACE   >  EXIT: _mongoc_stream_socket_poll():253
[2020-11-10T14:54:53.585013+00:00]     stream: TRACE   > ENTRY: mongoc_stream_failed():80
[2020-11-10T14:54:53.585017+00:00]     stream: TRACE   > ENTRY: _mongoc_stream_socket_failed():93
[2020-11-10T14:54:53.585020+00:00]     stream: TRACE   > ENTRY: _mongoc_stream_socket_destroy():72
[2020-11-10T14:54:53.585023+00:00]     socket: TRACE   > ENTRY: mongoc_socket_close():790
[2020-11-10T14:54:53.585038+00:00]     socket: TRACE   >  EXIT: mongoc_socket_close():825
[2020-11-10T14:54:53.585042+00:00]     stream: TRACE   >  EXIT: _mongoc_stream_socket_destroy():86
[2020-11-10T14:54:53.585045+00:00]     stream: TRACE   >  EXIT: _mongoc_stream_socket_failed():97
[2020-11-10T14:54:53.585048+00:00]     stream: TRACE   >  EXIT: mongoc_stream_failed():90
[2020-11-10T14:54:53.585077+00:00]     mongoc: TRACE   > TRACE: _mongoc_topology_clear_connection_pool():1601 clearing pool for server: localhost:4429
[2020-11-10T14:54:53.585130+00:00]     mongoc: TRACE   > TRACE: mongoc_topology_description_handle_ismaster():1974 ismaster_response = <NULL>
[2020-11-10T14:54:53.585172+00:00]     mongoc: TRACE   > ENTRY: mongoc_server_description_handle_ismaster():527
[2020-11-10T14:54:53.585184+00:00]     mongoc: TRACE   >  EXIT: mongoc_server_description_handle_ismaster():534
[2020-11-10T14:54:53.585256+00:00]     mongoc: TRACE   > TRACE: mongoc_topology_description_handle_ismaster():2031 Topology description Unknown ignoring server description Unknown
[2020-11-10T14:54:53.585292+00:00]     mongoc: TRACE   > ENTRY: mongoc_server_description_destroy():163
[2020-11-10T14:54:53.585296+00:00]     mongoc: TRACE   >  EXIT: mongoc_server_description_destroy():166

libmongoc tries to connect to the first server, fails and removes it from the topology. Later, it successfully looks up the other two servers and deduces it is connected to a sharded cluster:

[2020-11-10T14:54:53.586473+00:00]     mongoc: TRACE   > TRACE: mongoc_topology_description_handle_ismaster():1974 ismaster_response = { "ismaster" : true, "msg" : "isdbgrid", "topologyVersion" : { "processId" : { "$oid" : "5fa8f972bc83d763e94547de" }, "counter" : { "$numberLong" : "0"} }, "maxBsonObjectSize" : { "$numberInt" : "16777216" }, "maxMessageSizeBytes" : { "$numberInt" : "48000000" }, "maxWriteBatchSize" : { "$numberInt" : "100000" }, "localTime" : { "$date" : { "$numberLong" : "1605020093583" } }, "logicalSessionTimeoutMinutes" : { "$numberInt" : "30" }, "connectionId" : { "$numberInt" : "1011" }, "maxWireVersion" : { "$numberInt" : "10" }, "minWireVersion" : { "$numberInt" : "0" }, "ok" : { "$numberDouble" : "1.0" }, "$clusterTime" : { "clusterTime" : { "$timestamp" : { "t" : 1605020088, "i" : 3 } }, "signature" : { "hash" : { "$binary" : { "base64": "AAAAAAAAAAAAAAAAAAAAAAAAAAA=", "subType" : "00" } }, "keyId" : { "$numberLong" : "0"} } }, "operationTime" : { "$timestamp" : { "t" : 1605020088, "i" : 3 } } }
[2020-11-10T14:54:53.586481+00:00]     mongoc: TRACE   > ENTRY: mongoc_server_description_handle_ismaster():527
[2020-11-10T14:54:53.586803+00:00]     mongoc: TRACE   >  EXIT: mongoc_server_description_handle_ismaster():725
[2020-11-10T14:54:53.586838+00:00]     mongoc: TRACE   > TRACE: mongoc_topology_description_handle_ismaster():2026 Topology description Unknown handling server description Mongos

[...]

[2020-11-10T14:54:53.587652+00:00]     mongoc: TRACE   > ENTRY: mongoc_topology_description_select():800
[2020-11-10T14:54:53.587726+00:00]     mongoc: TRACE   > TRACE: mongoc_topology_description_select():832 Topology type [Sharded], selected [Mongos] [localhost:4430]
[2020-11-10T14:54:53.587730+00:00]     mongoc: TRACE   >  EXIT: mongoc_topology_description_select():835

For what it's worth, I also tried moving the non-existent mongos to other places in the connection string, but that did not yield any issues.

If you are able to reproduce the problem, a trace like above would certainly be helpful. You can use ini_set('mongodb.debug', <target>); to either log to stdout or a log file. Before sharing such a log, be aware that the log could contain sensitive information, such as credentials or full documents. If you can reproduce this error with a ping command as done above, that would be ideal as this excludes most other causes. If not, please share a script that reproduces the error on your end along with the full debug log. Thank you!

H-Max commented 3 years ago

Hello @alcaeus,

Thanks a lot for testing this out.

I am terribly sorry but it's working now and I have no idea why yet. I am looking at the configurations and the way we connect to the mongo cluster right now to look for recent changes and understand why it'd do that, because I am positive that the issue was present and really painful.

Also, one thing I see is that the more mongos hosts I put in the connection string, the slower it gets to select a valid one, as if it was connecting to each one of them sequentially, waiting for an answer and at the end, pick the fastest (no parallelism).

alcaeus commented 3 years ago

I am terribly sorry but it's working now and I have no idea why yet. I am looking at the configurations and the way we connect to the mongo cluster right now to look for recent changes and understand why it'd do that, because I am positive that the issue was present and really painful.

In that case, let's hope that it was a fluke and doesn't reappear. If it does, I'd appreciate it if you could get me a trace as that would allow us to check what's going on.

Also, one thing I see is that the more mongos hosts I put in the connection string, the slower it gets to select a valid one, as if it was connecting to each one of them sequentially, waiting for an answer and at the end, pick the fastest (no parallelism).

I'd have to test this. Do you have a rough number of mongos hosts where it starts getting noticeably slower?

H-Max commented 3 years ago

In that case, let's hope that it was a fluke and doesn't reappear. If it does, I'd appreciate it if you could get me a trace as that would allow us to check what's going on.

Will do. We used to connect to mongodb using a single host and using a try catch to retry and connect 5 times (using connectionTimeoutMS settings) but we changed that efficency and thought it would be a good time to use multiple hosts in the connection string. This is probably related.

I'd have to test this. Do you have a rough number of mongos hosts where it starts getting noticeably slower?

We are testing with 7 mongos in the list. They are all responding correctly if we declare them as a single host.

The mongos are not running on localhost so we can see the latency (10ms per host for instance, that stockpile up to 60 or 70 ms).

alcaeus commented 3 years ago

We are testing with 7 mongos in the list. They are all responding correctly if we declare them as a single host.

The mongos are not running on localhost so we can see the latency (10ms per host for instance, that stockpile up to 60 or 70 ms).

Ok, so I should definitely start seeing changes even with a small amount of servers. I'll see what I can do. One more thing I forgot before: you've shared a connection string in the OP, I just wanted to confirm that this is accurate and you don't set any options on the client (either through the connection string or through the $options argument).

H-Max commented 3 years ago

Here are the options in the connection string:

( [socketTimeoutMS] => 3600000 [connectTimeoutMS] => 1000 [serverSelectionTimeoutMS] => 2000 [writeConcern] => majority [tls] => 1 [tlsInsecure] => 1 [username] => XXXXXX [password] => XXXXXXX [authSource] => XXXXXXXX [readConcernLevel] => available [appName] => m1by1 )

I'm running tests on my side and I'm writing simplier code to handle the connection errors in our abstract class. Maybe it's a good idea to wait a bit on your side, I wouldn't want to borrow some of your time for something that is caused because of our code.

H-Max commented 3 years ago

So I tried using a hack and creating a * dns hostname to point 20 different hosts to the same mongos.

This host is valid and connection is working fine.

The timing is as follows:

With a single host in the connection string: 400ms average, 500ms top
With 20 hosts (pointing to the same mongos with different hostname) : between 700ms and 1000ms
With 10 valid hosts and ports + 10 valid hosts but invalid port: between 1800ms and 2000ms

We connect with the option above and using this piece of code:

try {
    self::$client = new Client(
        self::$host,
        $options,
        $driverOptions
    );
    self::$database = self::$client->selectDatabase(self::$dbName);
    return self::$client;
} catch (Exception $e) {
    // Removed code for connection issue debug
}

alcaeus commented 3 years ago

@H-Max that's great info, thank you very much for that. Can you share the code you used to select a server and to measure timings?

H-Max commented 3 years ago

We mesure timing with the shell time, on an isolated (not doing anything else) server, against our private api with a simple ping/pong request. It tries to connect to the mongodb instance and if it works, it returns "pong" (well, you know the drill....).

To create the connection string, we have a simple array with hosts, and a function like that (I removed unnecessary code):

// mongosXX.shard.internal are aliases to mongos.shard.internal
$config = [
    'host'=>[
        //'mongos.shard.internal:32001', // default working single host
        'mongos01.shard.internal:32001', // OK
        'mongos02.shard.internal:32002', // Wrong port
        'mongos03.shard.internal:32001', // OK
        'mongos04.shard.internal:32002', // Wrong port
        'mongos05.shard.internal:32001', // OK
        'mongos06.shard.internal:32002', // Wrong port
        'mongos07.shard.internal:32001', // OK
        'mongos08.shard.internal:32002', // Wrong port
        'mongos09.shard.internal:32001', // OK
        'mongos10.shard.internal:32002', // Wrong port
        'mongos11.shard.internal:32001', // OK
        'mongos12.shard.internal:32002', // Wrong port
    ]
];

// the function supports srv string as well
function _getHostString($cfg)
{
    if (empty($cfg['srv'])) {
        $scheme = 'mongodb://';
        if (is_array($cfg['host'])) {
            return $scheme . join(',', $cfg['host']);
        }
        return $scheme . $cfg['host'] . ((!empty($cfg['port'])) ? ":{$cfg['port']}" : '');
    }
    return 'mongodb+srv://' . $cfg['srv'];
}

H-Max commented 3 years ago

Last but not least : reducing the connectTimeoutMS setting to 100ms instead of 1000ms make the "10 valid hosts and ports + 10 valid hosts" timing drop to 1000ms instead of 2000ms.

H-Max commented 3 years ago

So I have been running tests and trying to isolate the issue and I found out something interesting, and disturbing.

The code is actually instant and the problem inexistant if we run the php code using PHP-FPM behind a Nginx and HTTP request.

It's slow and grumpy when running is using PHP-CLI.

(PHP is running in a docker container, php:7.4.10-fpm-alpine)

I have no idea why, I'm looking into it.

alcaeus commented 3 years ago

Hi @H-Max,

sorry I haven't replied back, but I haven't found the time to further debug this issue. I was initially confused by CLI not showing the behaviour, but it recently dawned on me that what you're seeing is most likely due to connection persistence: in your FPM processes, the underlying connections are re-used between requests, which means that there's no need for an additional handshake and establishing connections. Since the CLI process terminates after one run, the next run will have to re-establish connections to MongoDB.

To verify whether that's the case, you can do the following in your test script:

Create one client with the configuration you mentioned above and run your tests that show this performance issue
Create a second client with the same configuration and run the tests again: there shouldn't be a performance issue this time.

Let me know if that provides any more insight into the behaviour. I'll try to find the time to do some tests of my own soon.

H-Max commented 3 years ago

Hello @alcaeus,

I believe you are pointing in the right direction here: the connection persistance "issue" because of php cli scripts "restarting" from scratch at every single call.

I'm been trying to optimize and fix this, but we can't afford dropping our cli scripts right now. I guess the solution is to use a single mongos in the cli scripts, on localhost for instance and make sure this mongos does not die.

alcaeus commented 3 years ago

Hi @H-Max,

I was pointing this out as an explanation for why you may not encounter issues in FPM but only in CLI - I wasn't suggesting CLI scripts.

I did some more testing today. I've created a sharded cluster configuration for mongo-orchestration that starts 20 mongos processes. I then created a test script that performs server selection 100000 times on managers with different connection strings and measures the time. You can see the test script and the log output in a gist I created: https://gist.github.com/alcaeus/07eb65f186883a4095d61291a5ed5ad0.

Looking at the data, I can see that the time increases from 0.97 seconds with a single mongos to 1.75 seconds with 10, and 1.76 seconds with 20 mongos processes. Note that in all cases, a single ping was executed before running the test to make sure that the driver creates connections to the various servers without affecting the measurement. While it's true that server selection takes longer with more mongos processes, I wouldn't consider that difference out of the ordinary. However, I do concede that my tests ran against a sharded cluster that was running on localhost, which pretty much takes latency out of the equation.

I also re-ran the tests with 1 million server selections to ensure that the driver would re-scan the topology several times. The times for that test are as follows:

Testing 1 available mongos
Function call Select server with secondary_preferred read preference took 22.61807 seconds

Testing 10 available mongos
Function call Select server with secondary_preferred read preference took 25.47074 seconds

Testing 20 available mongos
Function call Select server with secondary_preferred read preference took 23.47316 seconds

Testing 10 unavailable mongos
Function call Select server with secondary_preferred read preference took 25.05046 seconds

Again, these results aren't much different from what I'd expect. Maybe adapting the script and testing it in your setup provides some more insights as you seem to be testing with remote mongos processes.

Alexander-L-G commented 3 years ago

Tracked in https://jira.mongodb.org/browse/PHPLIB-604

alcaeus commented 3 years ago

Closing due to inactivity. Feel free to follow up if this is still outstanding.

mongodb / mongo-php-library