mongodb / mongo-php-driver

The Official MongoDB PHP driver
https://pecl.php.net/package/mongodb
Apache License 2.0
889 stars 204 forks source link

No suitable servers found (`serverselectiontryonce` set): [connection error calling ismaster on 'xxx.xxx.xxx.xxx:xxxx'] #425

Closed udoless closed 8 years ago

udoless commented 8 years ago

Description

I have a problem like issue374

Environment

php 7.0.7 mongo 3.0.12

$ php -i | grep -E 'mongodb|libmongoc|libbson'
mongodb
mongodb support => enabled
mongodb version => 1.1.8
mongodb stability => stable
libmongoc version => 1.3.5
libbson version => 1.3.5
mongodb.debug => no value => no value

Test Script

<?php
$options = [
    'connectTimeoutMS' => 2000,
    'socketTimeoutMSG' => 3000,
    'readPreference'   => MongoDB\Driver\ReadPreference::RP_SECONDARY_PREFERRED
];
$manager = new MongoDB\Driver\Manager("mongodb://10.77.48.48:27001,10.77.48.49:27001,10.77.48.26:27001,10.77.48.27:27001", $options);
$where   = [
    'user_id' => 1242369332
];
$options = [
    'projection' => [
        'user_id'        => 1,
        'fans_num'       => 1,
        'ugc_follow_num' => 1,
        'pgc_follow_num' => 1,
        '_id'            => 0
    ],
    'limit'      => 1
];
$query   = new MongoDB\Driver\Query($where, $options);
$cursor  = $manager->executeQuery("ugc_user_relation.user_follow_stat", $query);
$cursor->setTypeMap(['root' => 'array']);
var_dump($cursorArray = $cursor->toArray());

Exception(randomly)

No suitable servers found (`serverselectiontryonce` set):
[connection error calling ismaster on '10.77.48.48:27001']
[connection error calling ismaster on '10.77.48.49:27001']
[connection error calling ismaster on '10.77.48.27:27001'] 
[connection error calling ismaster on '10.77.48.47:27001'] 
[Failed connecting to '10.110.36.24:27001': Connection timed out]

Mongo Master Status

/* 1 */
{
    "host" : "show-user-relation-1.qiyi.mongo:27001",
    "version" : "3.0.12",
    "process" : "mongod",
    "pid" : NumberLong(7485),
    "uptime" : 6686745.0,
    "uptimeMillis" : NumberLong(6686744699),
    "uptimeEstimate" : 7050282.0,
    "localTime" : ISODate("2016-09-27T07:57:41.365Z"),
    "asserts" : {
        "regular" : 0,
        "warning" : 0,
        "msg" : 0,
        "user" : 282,
        "rollovers" : 0
    },
    "connections" : {
        "current" : 38,
        "available" : 65498,
        "totalCreated" : NumberLong(2451499864)
    },
    "cursors" : {
        "note" : "deprecated, use server status metrics",
        "clientCursors_size" : 1,
        "totalOpen" : 1,
        "pinned" : 0,
        "totalNoTimeout" : 0,
        "timedOut" : 14
    },
    "extra_info" : {
        "note" : "fields vary by platform",
        "heap_usage_bytes" : -901376,
        "page_faults" : 55
    },
    "globalLock" : {
        "totalTime" : NumberLong(6686744836000),
        "currentQueue" : {
            "total" : 0,
            "readers" : 0,
            "writers" : 0
        },
        "activeClients" : {
            "total" : 63,
            "readers" : 0,
            "writers" : 0
        }
    },
    "locks" : {
        "Global" : {
            "acquireCount" : {
                "r" : NumberLong(767985359),
                "w" : NumberLong(202889006),
                "R" : NumberLong(6),
                "W" : NumberLong(143)
            },
            "acquireWaitCount" : {
                "r" : NumberLong(17),
                "w" : NumberLong(24),
                "W" : NumberLong(3)
            },
            "timeAcquiringMicros" : {
                "r" : NumberLong(173407),
                "w" : NumberLong(520354),
                "W" : NumberLong(730)
            }
        },
        "Database" : {
            "acquireCount" : {
                "r" : NumberLong(282437954),
                "w" : NumberLong(202888971),
                "R" : NumberLong(110411),
                "W" : NumberLong(35)
            },
            "acquireWaitCount" : {
                "w" : NumberLong(109),
                "R" : NumberLong(245),
                "W" : NumberLong(1)
            },
            "timeAcquiringMicros" : {
                "w" : NumberLong(3997),
                "R" : NumberLong(22661),
                "W" : NumberLong(18)
            }
        },
        "Collection" : {
            "acquireCount" : {
                "r" : NumberLong(103614803),
                "w" : NumberLong(126971786)
            }
        },
        "Metadata" : {
            "acquireCount" : {
                "W" : NumberLong(5411)
            },
            "acquireWaitCount" : {
                "W" : NumberLong(353)
            },
            "timeAcquiringMicros" : {
                "W" : NumberLong(115104)
            }
        },
        "oplog" : {
            "acquireCount" : {
                "r" : NumberLong(179380058),
                "w" : NumberLong(75917185)
            }
        }
    },
    "network" : {
        "bytesIn" : NumberLong(1018164089962),
        "bytesOut" : NumberLong(4533181194523),
        "numRequests" : NumberLong(10116615255)
    },
    "opcounters" : {
        "insert" : 0,
        "query" : 67369043,
        "update" : 57133672,
        "delete" : 690338,
        "getmore" : 24817888,
        "command" : 196162295
    },
    "opcountersRepl" : {
        "insert" : 145,
        "query" : 0,
        "update" : 312,
        "delete" : 64,
        "getmore" : 0,
        "command" : 0
    },
    "repl" : {
        "setName" : "show_user_relation_rs1",
        "setVersion" : 12,
        "ismaster" : true,
        "secondary" : false,
        "hosts" : [ 
            "10.77.48.47:27001", 
            "10.77.48.48:27001", 
            "10.77.48.49:27001", 
            "10.77.48.26:27001", 
            "10.77.48.27:27001"
        ],
        "passives" : [ 
            "10.110.36.24:27001", 
            "10.110.36.47:27001"
        ],
        "primary" : "10.77.48.47:27001",
        "tags" : {
            "dc" : "sjhl"
        },
        "me" : "10.77.48.47:27001",
        "electionId" : ObjectId("57841e629761b753db898b00"),
        "rbid" : 1333656065
    },
    "storageEngine" : {
        "name" : "wiredTiger"
    },
    "wiredTiger" : {
        "uri" : "statistics:",
        "LSM" : {
            "sleep for LSM checkpoint throttle" : 0,
            "sleep for LSM merge throttle" : 0,
            "rows merged in an LSM tree" : 0,
            "application work units currently queued" : 0,
            "merge work units currently queued" : 0,
            "tree queue hit maximum" : 0,
            "switch work units currently queued" : 0,
            "tree maintenance operations scheduled" : 0,
            "tree maintenance operations discarded" : 0,
            "tree maintenance operations executed" : 0
        },
        "async" : {
            "number of allocation state races" : 0,
            "number of operation slots viewed for allocation" : 0,
            "current work queue length" : 0,
            "number of flush calls" : 0,
            "number of times operation allocation failed" : 0,
            "maximum work queue length" : 0,
            "number of times worker found no work" : 0,
            "total allocations" : 0,
            "total compact calls" : 0,
            "total insert calls" : 0,
            "total remove calls" : 0,
            "total search calls" : 0,
            "total update calls" : 0
        },
        "block-manager" : {
            "mapped bytes read" : 0,
            "bytes read" : 15610183680.0,
            "bytes written" : 701829672960.0,
            "mapped blocks read" : 0,
            "blocks pre-loaded" : 244,
            "blocks read" : 2189950,
            "blocks written" : 69443115
        },
        "cache" : {
            "tracked dirty bytes in the cache" : 93743898,
            "tracked bytes belonging to internal pages in the cache" : 48223439,
            "bytes currently in the cache" : 12346094929.0,
            "tracked bytes belonging to leaf pages in the cache" : 64376286001.0,
            "maximum bytes configured" : 64424509440.0,
            "tracked bytes belonging to overflow pages in the cache" : 0,
            "bytes read into cache" : 23288172017.0,
            "bytes written from cache" : 994267795642.0,
            "pages evicted by application threads" : 0,
            "checkpoint blocked page eviction" : 530,
            "unmodified pages evicted" : 21467,
            "page split during eviction deepened the tree" : 2,
            "modified pages evicted" : 631419,
            "pages selected for eviction unable to be evicted" : 398417,
            "pages evicted because they exceeded the in-memory maximum" : 2170,
            "pages evicted because they had chains of deleted items" : 626651,
            "failed eviction of pages that exceeded the in-memory maximum" : 3518,
            "hazard pointer blocked page eviction" : 390424,
            "internal pages evicted" : 3437,
            "maximum page size at eviction" : 10485910,
            "eviction server candidate queue empty when topping up" : 5158,
            "eviction server candidate queue not empty when topping up" : 830,
            "eviction server evicting pages" : 1,
            "eviction server populating queue, but not evicting pages" : 5987,
            "eviction server unable to reach eviction goal" : 0,
            "pages split during eviction" : 8938,
            "pages walked for eviction" : 33409071,
            "eviction worker thread evicting pages" : 5765,
            "in-memory page splits" : 1680,
            "percentage overhead" : 8,
            "tracked dirty pages in the cache" : 205,
            "pages currently held in the cache" : 235023,
            "pages read into cache" : 873283,
            "pages written from cache" : 66809809
        },
        "connection" : {
            "pthread mutex condition wait calls" : 2190516825.0,
            "files currently open" : 30,
            "memory allocations" : 1782207931.0,
            "memory frees" : 1680777097.0,
            "memory re-allocations" : 557926,
            "total read I/Os" : 2190335,
            "pthread mutex shared lock read-lock calls" : 37581157,
            "pthread mutex shared lock write-lock calls" : 9148737,
            "total write I/Os" : 175529732
        },
        "cursor" : {
            "cursor create calls" : 32103786,
            "cursor insert calls" : 224323531,
            "cursor next calls" : 2706694680.0,
            "cursor prev calls" : 18,
            "cursor remove calls" : 53853222,
            "cursor reset calls" : 2399718764.0,
            "cursor search calls" : 1643082177.0,
            "cursor search near calls" : 771390102,
            "cursor update calls" : 35
        },
        "data-handle" : {
            "connection data handles currently active" : 27,
            "session dhandles swept" : 526870,
            "session sweep attempts" : 110315,
            "connection sweep dhandles closed" : 0,
            "connection sweep candidate became referenced" : 0,
            "connection sweep dhandles removed from hash list" : 1206444,
            "connection sweep time-of-death sets" : 1206862,
            "connection sweeps" : 668665
        },
        "log" : {
            "total log buffer size" : 33554432,
            "log bytes of payload data" : 28553249550.0,
            "log bytes written" : 34058557952.0,
            "yields waiting for previous log file close" : 0,
            "total size of compressed records" : 13862333365.0,
            "total in-memory size of compressed records" : 18895347700.0,
            "log records too small to compress" : 22728098,
            "log records not compressed" : 33433977,
            "log records compressed" : 50382307,
            "log flush operations" : 0,
            "maximum log file size" : 104857600,
            "pre-allocated log files prepared" : 326,
            "number of pre-allocated log files to create" : 1,
            "pre-allocated log files used" : 325,
            "log read operations" : 0,
            "log release advances write LSN" : 110361,
            "records processed by log scan" : 11,
            "log scan records requiring two reads" : 2,
            "log scan operations" : 5,
            "consolidated slot closures" : 105976132,
            "logging bytes consolidated" : 34126470016.0,
            "consolidated slot joins" : 106681673,
            "consolidated slot join races" : 221593,
            "slots selected for switching that were unavailable" : 1,
            "record size exceeded maximum" : 0,
            "failed to find a slot large enough for record" : 0,
            "consolidated slot join transitions" : 26730,
            "log sync operations" : 220423,
            "log sync_dir operations" : 326,
            "log server thread advances write LSN" : 105865925,
            "log write operations" : 106500219
        },
        "reconciliation" : {
            "page reconciliation calls" : 42135156,
            "page reconciliation calls for eviction" : 620657,
            "split bytes currently awaiting free" : 504,
            "split objects currently awaiting free" : 9
        },
        "session" : {
            "open cursor count" : 335,
            "open session count" : 34
        },
        "thread-yield" : {
            "page acquire busy blocked" : 2,
            "page acquire eviction blocked" : 79,
            "page acquire locked blocked" : 5008245,
            "page acquire read blocked" : 36145,
            "page acquire time sleeping (usecs)" : 44008969
        },
        "transaction" : {
            "transaction begins" : 443242921,
            "transaction checkpoints" : 110208,
            "transaction checkpoint generation" : 110208,
            "transaction checkpoint currently running" : 0,
            "transaction checkpoint max time (msecs)" : 477,
            "transaction checkpoint min time (msecs)" : 2,
            "transaction checkpoint most recent time (msecs)" : 238,
            "transaction checkpoint total time (msecs)" : 20199834,
            "transactions committed" : 106336837,
            "transaction failures due to cache overflow" : 0,
            "transaction range of IDs currently pinned by a checkpoint" : 0,
            "transaction range of IDs currently pinned" : 0,
            "transactions rolled back" : 333873829
        },
        "concurrentTransactions" : {
            "write" : {
                "out" : 0,
                "available" : 128,
                "totalTickets" : 128
            },
            "read" : {
                "out" : 1,
                "available" : 127,
                "totalTickets" : 128
            }
        }
    },
    "writeBacksQueued" : false,
    "mem" : {
        "bits" : 64,
        "resident" : 12197,
        "virtual" : 14117,
        "supported" : true,
        "mapped" : 0,
        "mappedWithJournal" : 0
    },
    "metrics" : {
        "commands" : {
            "_isSelf" : {
                "failed" : NumberLong(0),
                "total" : NumberLong(52)
            },
            "buildInfo" : {
                "failed" : NumberLong(0),
                "total" : NumberLong(2383995790)
            },
            "collStats" : {
                "failed" : NumberLong(0),
                "total" : NumberLong(12)
            },
            "count" : {
                "failed" : NumberLong(0),
                "total" : NumberLong(3079016)
            },
            "dbStats" : {
                "failed" : NumberLong(0),
                "total" : NumberLong(16)
            },
            "delete" : {
                "failed" : NumberLong(0),
                "total" : NumberLong(3365673)
            },
            "eval" : {
                "failed" : NumberLong(0),
                "total" : NumberLong(124)
            },
            "getLog" : {
                "failed" : NumberLong(0),
                "total" : NumberLong(35)
            },
            "getnonce" : {
                "failed" : NumberLong(0),
                "total" : NumberLong(40)
            },
            "group" : {
                "failed" : NumberLong(1),
                "total" : NumberLong(1)
            },
            "hostInfo" : {
                "failed" : NumberLong(0),
                "total" : NumberLong(2)
            },
            "isMaster" : {
                "failed" : NumberLong(0),
                "total" : NumberLong(4815743118)
            },
            "listCollections" : {
                "failed" : NumberLong(0),
                "total" : NumberLong(110391)
            },
            "listDatabases" : {
                "failed" : NumberLong(0),
                "total" : NumberLong(111551)
            },
            "listIndexes" : {
                "failed" : NumberLong(0),
                "total" : NumberLong(4)
            },
            "ping" : {
                "failed" : NumberLong(0),
                "total" : NumberLong(2385327425)
            },
            "replSetFresh" : {
                "failed" : NumberLong(0),
                "total" : NumberLong(5)
            },
            "replSetGetConfig" : {
                "failed" : NumberLong(0),
                "total" : NumberLong(56)
            },
            "replSetGetStatus" : {
                "failed" : NumberLong(0),
                "total" : NumberLong(1519855)
            },
            "replSetHeartbeat" : {
                "failed" : NumberLong(0),
                "total" : NumberLong(18256143)
            },
            "replSetReconfig" : {
                "failed" : NumberLong(0),
                "total" : NumberLong(9)
            },
            "replSetUpdatePosition" : {
                "failed" : NumberLong(10),
                "total" : NumberLong(251579128)
            },
            "saslContinue" : {
                "failed" : NumberLong(0),
                "total" : NumberLong(10)
            },
            "saslStart" : {
                "failed" : NumberLong(0),
                "total" : NumberLong(5)
            },
            "serverStatus" : {
                "failed" : NumberLong(0),
                "total" : NumberLong(116506)
            },
            "update" : {
                "failed" : NumberLong(0),
                "total" : NumberLong(102492913)
            },
            "usersInfo" : {
                "failed" : NumberLong(0),
                "total" : NumberLong(3)
            },
            "whatsmyuri" : {
                "failed" : NumberLong(0),
                "total" : NumberLong(65)
            }
        },
        "cursor" : {
            "timedOut" : NumberLong(14),
            "open" : {
                "noTimeout" : NumberLong(0),
                "pinned" : NumberLong(0),
                "total" : NumberLong(1)
            }
        },
        "document" : {
            "deleted" : NumberLong(2946846),
            "inserted" : NumberLong(0),
            "returned" : NumberLong(451913429),
            "updated" : NumberLong(102484391)
        },
        "getLastError" : {
            "wtime" : {
                "num" : 0,
                "totalMillis" : 0
            },
            "wtimeouts" : NumberLong(0)
        },
        "operation" : {
            "fastmod" : NumberLong(0),
            "idhack" : NumberLong(0),
            "scanAndOrder" : NumberLong(5151788),
            "writeConflicts" : NumberLong(7645)
        },
        "queryExecutor" : {
            "scanned" : NumberLong(1407863148),
            "scannedObjects" : NumberLong(1285181131)
        },
        "record" : {
            "moves" : NumberLong(0)
        },
        "repl" : {
            "apply" : {
                "batches" : {
                    "num" : 5,
                    "totalMillis" : 58
                },
                "ops" : NumberLong(516)
            },
            "buffer" : {
                "count" : NumberLong(0),
                "maxSizeBytes" : 268435456,
                "sizeBytes" : NumberLong(0)
            },
            "network" : {
                "bytes" : NumberLong(69810),
                "getmores" : {
                    "num" : 5,
                    "totalMillis" : 5520
                },
                "ops" : NumberLong(516),
                "readersCreated" : NumberLong(1)
            },
            "preload" : {
                "docs" : {
                    "num" : 0,
                    "totalMillis" : 0
                },
                "indexes" : {
                    "num" : 0,
                    "totalMillis" : 0
                }
            }
        },
        "storage" : {
            "freelist" : {
                "search" : {
                    "bucketExhausted" : NumberLong(0),
                    "requests" : NumberLong(0),
                    "scanned" : NumberLong(0)
                }
            }
        },
        "ttl" : {
            "deletedDocuments" : NumberLong(0),
            "passes" : NumberLong(111445)
        }
    },
    "ok" : 1.0
}
udoless commented 8 years ago

I find that when occur an exception, at most time php was connecting a passive node then connection timed out.

jmikola commented 8 years ago

@udoless: Last week, we diagnosed a bug in libmongoc that resulted in connection errors being incorrectly reported for available servers. The issue is closely related to libmongoc's stream handler API, which is used in versions of the PHP driver before 1.2.0alpha2. See CDRIVER-1571 and this reproduction test for more information.

I believe this may be related to your issue, since it is really just the passive node that timed out. In PHP driver 1.1.x, "Failed connecting to 'host:port': Connection timed out" is an exception from the PHP stream handler. The other "connection error" messages in the exception are evidence of the aforementioned bug, which also causes server selection to fail prematurely.

Separate from this issue, I'll also point out that those "connection error" messages should actually read "connection timeout" due to CDRIVER-1567.


Since this bug is being fixed in libmongoc 1.5.0, we won't be able to add it to a maintenance release for the 1.1.x PHP driver. I'd appreciate if you can test the latest 1.2.0 release and confirm that the issue is resolved. While our 1.2.0 alphas still don't include the libmongoc bug fix, we aren't using custom stream handlers, which I expect bypasses the bug entirely.

jmikola commented 8 years ago

Since this bug is being fixed in libmongoc 1.5.0, we won't be able to add it to a maintenance release for the 1.1.x PHP driver.

The libmongoc team did end up back-porting their fix to 1.3.6, so we've released 1.1.9. This release should resolve the spurious timeouts ""No suitable servers found" exceptions when one or more replica sets nodes were inaccessible. The bug was specific to the PHP stream API integration, so it does not apply to version 1.2.0alpha2 and later.

I'm going to close this issue as resolved. Feel free to follow-up here or open a new issue as needed if the bug persists.