xp-forge / mongodb

MongoDB for the XP Framework
0 stars 0 forks source link

Exception: not primary when writing #42

Closed thekid closed 1 year ago

thekid commented 1 year ago

Occurred when issuing a write command after a while of the process being idle. Seems reconnect handling does not work reliably.

thekid commented 1 year ago

Initial tests show everything works as expected when switching connections. Added this debugging code:

diff --git a/src/main/php/com/mongodb/io/Connection.class.php b/src/main/php/com/mongodb/io/Connection.class.php
index e6fbaa3..8d2563c 100755
--- a/src/main/php/com/mongodb/io/Connection.class.php
+++ b/src/main/php/com/mongodb/io/Connection.class.php
@@ -203,6 +203,14 @@ class Connection {
     $r= $this->send(self::OP_MSG, "\x00\x00\x00\x00\x00", $sections);
     if (1 === (int)$r['body']['ok']) return $r;

+\util\cmd\Console::writeLine(
+  '[MONGODB] Error ',
+  $this->address() === $this->server['primary'] ? 'primary::' : 'secondary::',
+  $this->address(),
+  ' -> ',
+  $r
+);
+
     throw Error::newInstance($r['body']);
   }

diff --git a/src/main/php/com/mongodb/io/Protocol.class.php b/src/main/php/com/mongodb/io/Protocol.class.php
index 9f66569..5fe671e 100755
--- a/src/main/php/com/mongodb/io/Protocol.class.php
+++ b/src/main/php/com/mongodb/io/Protocol.class.php
@@ -194,6 +194,13 @@ class Protocol {
           foreach ($conn->server['hosts'] ?? [] as $host) {
             if ($conn->server['primary'] !== $host) $this->nodes['secondary'][]= $host;
           }
+
+\util\cmd\Console::writeLine(
+  '[MONGODB] Connected to ',
+  $candidate === $conn->server['primary'] ? 'primary::' : 'secondary::',
+  $candidate
+);
+
           shuffle($this->nodes['secondary']);
         } else if ($time - $conn->lastUsed >= $this->socketCheckInterval) {
           try {

When switching connections, the following appear correctly in the console before issuing the write command:

[MONGODB] Connected to secondary::ac-vkfrdmv-shard-00-00.abc1234.mongodb.net:27017
[MONGODB] Connected to primary::ac-vkfrdmv-shard-00-02.abc1234.mongodb.net:27017

The first is from the initial connect, the second when a write context is established.

thekid commented 1 year ago

More testing doesn't show an immediate bug in the implementation and suggests that a primary re-election on the server side may lead to this situation:

Event Client view Server side situation
Client: Before (n/a) {Primary: A, Secondary: [B, C]}
Client: Connect {Primary: A, Secondary: [B, C]} {Primary: A, Secondary: [B, C]}
Remote: Primary change {Primary: A, Secondary: [B, C]} {Primary: B, Secondary: [A, C]}
Client: Write {Primary: A, Secondary: [B, C]} {Primary: B, Secondary: [A, C]}

The client has an inconsistent view of the server side situation and uses node A for writing which it believes to be the primary, while in reality it's changed its role to a secondary.

An idea could be catch this specific error (any of the error codes categorized as "NotPrimaryError" here) and refresh the view into the cluster before retrying the write command.

thekid commented 1 year ago

Fixed in https://github.com/xp-forge/mongodb/releases/tag/v2.1.0