mycognosist / solar

A minimal Secure Scuttlebutt replication node.
Other
22 stars 2 forks source link

EBT session concludes immediately when acting as responder after timeout #89

Closed mycognosist closed 8 months ago

mycognosist commented 8 months ago

An active session between solar and Manyverse appears to timeout after 10 minutes. This must be initiated from Manyverse because there are not currently any such timeout counters in solar.

solar then takes the role of an EBT session responder for a new session (initiated by Manyverse) but for some reason the session is concluded immediately, seemingly by solar (this assumption has yet to be tested).

Here we see two log entries from an active session, followed by 10 minutes of nothing, and then we wait for a session request. This wait for session -> session concluded loop continues again and again.

[2024-01-29T13:00:45Z TRACE ebt-replication] Received message: Message { value: Object {"previous": String("%xGNP8Lzs1VRXJGLX5uvhCnhh8zpez1Ffi7jIE48G2OE=.sha256"), "sequence": Number(75), "author": String("@fGHFd8rUgoznX/qS/1U7HPF3vnirbSyfaaWlS8cCWR0=.ed25519"), "timestamp": Number(1706533245261), "hash": String("sha256"), "content": Object {"text": String("At a minimum, the bug fix doesn't appear to have broken anything..."), "type": String("post"), "root": String("%qg38tukhZsFYBX63HU9QC2LtQFHaiqW+gd5Dyfe2Zkw=.sha256"), "mentions": Array [], "branch": String("%qg38tukhZsFYBX63HU9QC2LtQFHaiqW+gd5Dyfe2Zkw=.sha256")}, "signature": String("64u0wMYgcdGJvBu/rgVTVFGIU7oqluCuBxVYjk+rO+P1hqd0ikj5xkclObTvFAJCZla9MMrnimAiYunV5zKbAA==.sig.ed25519")} }
[2024-01-29T13:00:45Z TRACE ebt-replication] Sending vector clock: {"@fGHFd8rUgoznX/qS/1U7HPF3vnirbSyfaaWlS8cCWR0=.ed25519": 150}
[2024-01-29T13:10:46Z TRACE ebt] Waiting for EBT session request
[2024-01-29T13:10:46Z TRACE ebt-session] Initiating EBT replication session with: fGHFd8rUgoznX/qS/1U7HPF3vnirbSyfaaWlS8cCWR0=.ed25519
[2024-01-29T13:10:46Z TRACE ebt-handler] Received replicate request: [EbtReplicate { version: 3, format: "classic" }]
[2024-01-29T13:10:46Z TRACE ebt-handler] Successfully validated replicate request arguments
[2024-01-29T13:10:46Z TRACE ebt-replication] Initiated EBT session with fGHFd8rUgoznX/qS/1U7HPF3vnirbSyfaaWlS8cCWR0=.ed25519 as responder
[2024-01-29T13:10:46Z TRACE ebt-session] Registered new EBT session 1 for fGHFd8rUgoznX/qS/1U7HPF3vnirbSyfaaWlS8cCWR0=.ed25519
[2024-01-29T13:10:46Z TRACE ebt-replication] Sending vector clock: {"@L/g6qZQE/2FdO2UhSJ0uyDiZb5LjJLatM/d8MN+INSM=.ed25519": 66, "@qK93G/R9R5J2fiqK+kxV72HqqPUcss+rth8rACcYr4s=.ed25519": 450, "@fGHFd8rUgoznX/qS/1U7HPF3vnirbSyfaaWlS8cCWR0=.ed25519": 146}
[2024-01-29T13:10:46Z TRACE ebt-replication] Session concluded with: fGHFd8rUgoznX/qS/1U7HPF3vnirbSyfaaWlS8cCWR0=.ed25519
[2024-01-29T13:10:48Z TRACE ebt] Waiting for EBT session request
[2024-01-29T13:10:48Z TRACE ebt-session] Initiating EBT replication session with: fGHFd8rUgoznX/qS/1U7HPF3vnirbSyfaaWlS8cCWR0=.ed25519
[2024-01-29T13:10:48Z TRACE ebt-handler] Received replicate request: [EbtReplicate { version: 3, format: "classic" }]
[2024-01-29T13:10:48Z TRACE ebt-handler] Successfully validated replicate request arguments
[2024-01-29T13:10:48Z TRACE ebt-replication] Initiated EBT session with fGHFd8rUgoznX/qS/1U7HPF3vnirbSyfaaWlS8cCWR0=.ed25519 as responder
[2024-01-29T13:10:48Z TRACE ebt-session] Registered new EBT session 1 for fGHFd8rUgoznX/qS/1U7HPF3vnirbSyfaaWlS8cCWR0=.ed25519
[2024-01-29T13:10:48Z TRACE ebt-replication] Sending vector clock: {"@L/g6qZQE/2FdO2UhSJ0uyDiZb5LjJLatM/d8MN+INSM=.ed25519": 66, "@qK93G/R9R5J2fiqK+kxV72HqqPUcss+rth8rACcYr4s=.ed25519": 450, "@fGHFd8rUgoznX/qS/1U7HPF3vnirbSyfaaWlS8cCWR0=.ed25519": 146}
[2024-01-29T13:10:48Z TRACE ebt-replication] Session concluded with: fGHFd8rUgoznX/qS/1U7HPF3vnirbSyfaaWlS8cCWR0=.ed25519

And the logs from the Manyverse side:

EBT:send json (@fGHFd8rUgoznX/qS/1U7HPF3vnirbSyfaaWlS8cCWR0=.ed25519) {
  "previous": "%xGNP8Lzs1VRXJGLX5uvhCnhh8zpez1Ffi7jIE48G2OE=.sha256",
  "sequence": 75,
  "author": "@fGHFd8rUgoznX/qS/1U7HPF3vnirbSyfaaWlS8cCWR0=.ed25519",
  "timestamp": 1706533245261,
  "hash": "sha256",
  "content": {
    "text": "At a minimum, the bug fix doesn't appear to have broken anything...",
    "type": "post",
    "root": "%qg38tukhZsFYBX63HU9QC2LtQFHaiqW+gd5Dyfe2Zkw=.sha256",
    "mentions": [],
    "branch": "%qg38tukhZsFYBX63HU9QC2LtQFHaiqW+gd5Dyfe2Zkw=.sha256"
  },
  "signature": "64u0wMYgcdGJvBu/rgVTVFGIU7oqluCuBxVYjk+rO+P1hqd0ikj5xkclObTvFAJCZla9MMrnimAiYunV5zKbAA==.sig.ed25519"
}
EBT:recv json (@fGHFd8rUgoznX/qS/1U7HPF3vnirbSyfaaWlS8cCWR0=.ed25519) {
  "@fGHFd8rUgoznX/qS/1U7HPF3vnirbSyfaaWlS8cCWR0=.ed25519": 150
}
EBT:dcon @L/g6qZQE/2FdO2UhSJ0uyDiZb5LjJLatM/d8MN+INSM=.ed25519
EBT:conn @L/g6qZQE/2FdO2UhSJ0uyDiZb5LjJLatM/d8MN+INSM=.ed25519
EBT:dcon @L/g6qZQE/2FdO2UhSJ0uyDiZb5LjJLatM/d8MN+INSM=.ed25519
rpc.ebt.replicate exception: Error: stream ended with:13 but wanted:34
    at drain (/opt/Manyverse/resources/app/index.js:53484:26)
    at /opt/Manyverse/resources/app/index.js:53504:18
    at /opt/Manyverse/resources/app/index.js:53555:13
    at drain (/opt/Manyverse/resources/app/index.js:78724:18)
    at Socket.<anonymous> (/opt/Manyverse/resources/app/index.js:78741:5)
    at Socket.emit (node:events:402:35)
    at Socket.emit (node:domain:475:12)
    at endReadableNT (node:internal/streams/readable:1343:12)
    at processTicksAndRejections (node:internal/process/task_queues:83:21)
mycognosist commented 8 months ago

Upon further observation, this behaviour is happening when an active session is underway and then Manyverse is closed. When Manyverse is started again, it connects to solar and solar then acts as responder. I noticed that solar is sending a vector clock twice. I believe this is because the MUXRPC handler from the first session is still active; then a second handler is started when the second connection takes place.

I'm not sure if this is the root of the issue but it needs to be dealt with anyway. So what I need to do is first ensure that the EBT MUXRPC handler is terminated when the associated network connection drops.

mycognosist commented 8 months ago

The issue turned out to be the sign of the request number. When acting as initiator, every outbound request number must be positive (even if the outbound message is classed as a "response"). The opposite is true when acting as responder.