mycognosist / solar

A minimal Secure Scuttlebutt replication node.
Other
20 stars 2 forks source link

Replication with Patchwork fails #87

Closed black-puppydog closed 5 months ago

black-puppydog commented 5 months ago

On current main (79514e966de8a330fec318caf259e83e13fa09e1) I am getting the following error when trying to replicate with a local patchwork instance:

[2024-01-20T09:55:26Z ERROR solar::actors::replication::ebt::replicator] EBT replicate handler failed: Validation(Json(Error("missing fieldkey", line: 1, column: 117)))

In solar/src/actors/muxrpc/ebt.rs in the handler https://github.com/mycognosist/solar/blob/79514e966de8a330fec318caf259e83e13fa09e1/solar/src/actors/muxrpc/ebt.rs#L71-L74

I added a bit of println magic and this is what came up:

[2024-01-20T10:25:16Z DEBUG solar::actors::replication::ebt::manager] Received EBT event message from broker
[2024-01-20T10:25:16Z DEBUG solar::actors::replication::ebt::replicator] Received RPC response: {"name":["tunnel","isRoom"],"args":[]}
[2024-01-20T10:25:16Z DEBUG solar::actors::replication::ebt::replicator] Received RPC response: Network(2, RpcRequest(Body { name: ["blobs", "createWants"], rpc_type: Source, args: Array [] }))
[2024-01-20T10:25:16Z DEBUG solar::actors::replication::ebt::replicator] Received RPC response: {"@Ac5Zx3J7UeAYWoyVX3RZtlADWwXETAW7WT0gcDtWDcQ=.ed25519":4,"@i5rWm+biEH39dffvNHVtr1ccaj3EjxJG538cl7BKMwM=.ed25519":2}
[2024-01-20T10:25:16Z ERROR solar::actors::replication::ebt::replicator] EBT replicate handler failed: Validation(Json(Error("missing field `key`", line: 1, column: 117)))
[2024-01-20T10:25:16Z DEBUG solar::actors::replication::ebt::manager] Received EBT event message from broker
black-puppydog commented 5 months ago

Reverting to 4e3a06e6959c2c40c266a53c9bc436997a820e9e (without ebt, but having kuska as a github dependency) solves this, but of course I'd love to use EBT 😁

mycognosist commented 5 months ago

@black-puppydog

Thanks for opening an issue! This is exactly the bug I was looking at this week 😅

Outbound replication from solar to Manyverse is working (haven't checked using Patchwork) but inbound replication is failing. As you see in the debug output, the error happens because solar is trying to deserialize the inbound request as a message but it is in fact a vector clock.

I believe this commit is at least part of the problem: https://github.com/mycognosist/solar/pull/85/commits/ed447ee2c1c126186183dbb01cd279d347ce5a88

Serves me right for trying to make things more efficient when they're not yet working perfectly! I'll work on this on Wednesday and keep you updated. I'm very much committed to getting EBT working reliably.

black-puppydog commented 5 months ago

Ah great, then you already know much more. Saves me the work to bisect it 😅

mycognosist commented 5 months ago

@black-puppydog

I started looking into this today and I am making some progress. Figured I'd share with you and anyone else who may read this at some point.

First I returned the vector clock deserialization code to the recv_rpc_response() method in the MUXRPC handler for EBT (I mentioned this in a previous comment in this issue). This did not fix anything.

Then I remembered that I could run Manyverse with EBT logging enabled. I then saw an error when the vector clock was received from solar: "no stream for incoming msg" (https://github.com/ssbc/packet-stream/blob/bb0493dd0eb6454ac675df78dfed20e994a4881f/index.js#L188). That allowed me to realise that the RpcType for the ebt_clock_res_send() method in kuska was incorrect (https://github.com/Kuska-ssb/ssb/blob/master/src/api/helper.rs#L220); should be Duplex and not Source. I made the change locally and on the first attempt Manyverse received the vector clock correctly and sent messages in response. Success! However, solar seems not to have received the messages.

Now Manyverse is not receiving the vector clock from solar, even though I haven't changed anything further 😅 So yeah, I'm making progress but I'm not quite there yet.

mycognosist commented 5 months ago

Haha ignore my previous comment...I was on the wrong track.

But I'm happy to report that I've got bidirectional replication working for the first time 😺

The short version: I was using the positive version of the request number when responding to Manyverse (ie. sending a message or a vector clock). The request numbers of MUXRPC responses need to be negative.

mycognosist commented 5 months ago

Fixed in: https://github.com/mycognosist/solar/pull/88

Thanks again for opening the issue, @black-puppydog . You gave me some additional motivation to solve this.