radicle-dev / radicle-link

The second iteration of the Radicle code collaboration protocol.
Other
423 stars 39 forks source link

Gossip packets failing parsing #648

Closed viraptor closed 3 years ago

viraptor commented 3 years ago

I'm seeing this error after updating to 0.2. Happened only 2 times so far: librad::net::protocol::io::recv::gossip: gossip recv error err=Cbor(Decode(TypeMismatch(114, "expected array")))

longer log chunk:

Apr 30 09:50:36 radicle-seed radicle-seed-node[10377]: Apr 30 09:50:36.002  INFO accept{peer_id=hyy4ebh8amk3ypyawgpjnfpbgkxqwdo5n1iif5s17yghwmewrf4d8a}:incoming:incoming{remote_id=hyd6ky9ndq65yqtwzpkb1xmgz4jt4ikezj1ezw69qkmnaccert7hjw remote_addr=69.162.253.6:47818}: librad::net::protocol::io::streams: new ingress stream
Apr 30 09:50:36 radicle-seed radicle-seed-node[10377]: Apr 30 09:50:36.002  INFO accept{peer_id=hyy4ebh8amk3ypyawgpjnfpbgkxqwdo5n1iif5s17yghwmewrf4d8a}:incoming:incoming{remote_id=hyd6ky9ndq65yqtwzpkb1xmgz4jt4ikezj1ezw69qkmnaccert7hjw remote_addr=69.162.253.6:47818}: librad::net::protocol::io::streams: new ingress stream
Apr 30 09:50:36 radicle-seed radicle-seed-node[10377]: Apr 30 09:50:36.002  WARN accept{peer_id=hyy4ebh8amk3ypyawgpjnfpbgkxqwdo5n1iif5s17yghwmewrf4d8a}:incoming:incoming{remote_id=hyd6ky9ndq65yqtwzpkb1xmgz4jt4ikezj1ezw69qkmnaccert7hjw remote_addr=69.162.253.6:47818}: librad::net::protocol::io::recv::gossip: gossip recv error err=Cbor(Decode(TypeMismatch(114, "expected array")))
Apr 30 09:50:36 radicle-seed radicle-seed-node[10377]: Apr 30 09:50:36.002 DEBUG accept{peer_id=hyy4ebh8amk3ypyawgpjnfpbgkxqwdo5n1iif5s17yghwmewrf4d8a}:incoming:incoming{remote_id=hyd6ky9ndq65yqtwzpkb1xmgz4jt4ikezj1ezw69qkmnaccert7hjw remote_addr=69.162.253.6:47818}:connection_lost{remote_peer=hyd6ky9ndq65yqtwzpkb1xmgz4jt4ikezj1ezw69qkmnaccert7hjw}: librad::net::protocol::membership::hpv: connection lost

I don't see any obvious way to reproduce this. Can I provide any more detailed logs to help narrow this down?

kim commented 3 years ago

Is this a known remote peer by any chance?

a9810a6b696e44b00ac332dc53f25bf68f86720d is the only occasion which could've introduced some wire encoding incompatibility. So this peer must be running a build from before that?

viraptor commented 3 years ago

So looking through logs again, I noticed this happening from sprout. We should be both running the latest version of -bins, I believe:

May 25 12:32:46 radicle-seed.viraptor.info radicle-seed-node[130268]: May 25 12:32:46.668  WARN accept{peer_id=hyy4ebh8amk3ypyawgpjnfpbgkxqwdo5n1iif5s17yghwmewrf4d8a}:incoming:incoming{remote_id=hynkyndc6w3p8urucakobzna7sxwgcqny7xxtw88dtx3pkf7m3nrzc remote_addr=34.91.134.19:12345}:incoming{remote_id=hynkyndc6w3p8urucakobzna7sxwgcqny7xxtw88dtx3pkf7m3nrzc remote_addr=34.91.134.19:12345}: librad::net::protocol::io::recv::gossip: gossip recv error err=Cbor(Decode(TypeMismatch(17, "expected array")))
May 25 12:33:08 radicle-seed.viraptor.info radicle-seed-node[130268]: May 25 12:33:08.985  WARN accept{peer_id=hyy4ebh8amk3ypyawgpjnfpbgkxqwdo5n1iif5s17yghwmewrf4d8a}:incoming:incoming{remote_id=hynkyndc6w3p8urucakobzna7sxwgcqny7xxtw88dtx3pkf7m3nrzc remote_addr=34.91.134.19:12345}:incoming{remote_id=hynkyndc6w3p8urucakobzna7sxwgcqny7xxtw88dtx3pkf7m3nrzc remote_addr=34.91.134.19:12345}: librad::net::protocol::io::recv::gossip: gossip recv error err=Cbor(Decode(TypeMismatch(17, "expected array")))
May 25 12:33:08 radicle-seed.viraptor.info radicle-seed-node[130268]: May 25 12:33:08.985  WARN accept{peer_id=hyy4ebh8amk3ypyawgpjnfpbgkxqwdo5n1iif5s17yghwmewrf4d8a}:incoming:incoming{remote_id=hynkyndc6w3p8urucakobzna7sxwgcqny7xxtw88dtx3pkf7m3nrzc remote_addr=34.91.134.19:12345}:incoming{remote_id=hynkyndc6w3p8urucakobzna7sxwgcqny7xxtw88dtx3pkf7m3nrzc remote_addr=34.91.134.19:12345}: librad::net::protocol::io::recv::gossip: gossip recv error err=Cbor(Decode(TypeMismatch(17, "expected array")))
May 25 12:33:08 radicle-seed.viraptor.info radicle-seed-node[130268]: May 25 12:33:08.985  WARN accept{peer_id=hyy4ebh8amk3ypyawgpjnfpbgkxqwdo5n1iif5s17yghwmewrf4d8a}:incoming:incoming{remote_id=hynkyndc6w3p8urucakobzna7sxwgcqny7xxtw88dtx3pkf7m3nrzc remote_addr=34.91.134.19:12345}:incoming{remote_id=hynkyndc6w3p8urucakobzna7sxwgcqny7xxtw88dtx3pkf7m3nrzc remote_addr=34.91.134.19:12345}: librad::net::protocol::io::recv::gossip: gossip recv error err=Cbor(Decode(TypeMismatch(114, "expected array")))
May 25 12:33:08 radicle-seed.viraptor.info radicle-seed-node[130268]: May 25 12:33:08.985  WARN accept{peer_id=hyy4ebh8amk3ypyawgpjnfpbgkxqwdo5n1iif5s17yghwmewrf4d8a}:incoming:incoming{remote_id=hynkyndc6w3p8urucakobzna7sxwgcqny7xxtw88dtx3pkf7m3nrzc remote_addr=34.91.134.19:12345}:incoming{remote_id=hynkyndc6w3p8urucakobzna7sxwgcqny7xxtw88dtx3pkf7m3nrzc remote_addr=34.91.134.19:12345}: librad::net::protocol::io::recv::gossip: gossip recv error err=Cbor(Decode(TypeMismatch(114, "expected array")))
May 25 12:33:17 radicle-seed.viraptor.info radicle-seed-node[130268]: May 25 12:33:17.192  WARN accept{peer_id=hyy4ebh8amk3ypyawgpjnfpbgkxqwdo5n1iif5s17yghwmewrf4d8a}:incoming:incoming{remote_id=hynkyndc6w3p8urucakobzna7sxwgcqny7xxtw88dtx3pkf7m3nrzc remote_addr=34.91.134.19:12345}:incoming{remote_id=hynkyndc6w3p8urucakobzna7sxwgcqny7xxtw88dtx3pkf7m3nrzc remote_addr=34.91.134.19:12345}: librad::net::protocol::io::recv::gossip: gossip recv error err=Cbor(Decode(TypeMismatch(114, "expected array")))
May 25 12:33:17 radicle-seed.viraptor.info radicle-seed-node[130268]: May 25 12:33:17.192  WARN accept{peer_id=hyy4ebh8amk3ypyawgpjnfpbgkxqwdo5n1iif5s17yghwmewrf4d8a}:incoming:incoming{remote_id=hynkyndc6w3p8urucakobzna7sxwgcqny7xxtw88dtx3pkf7m3nrzc remote_addr=34.91.134.19:12345}:incoming{remote_id=hynkyndc6w3p8urucakobzna7sxwgcqny7xxtw88dtx3pkf7m3nrzc remote_addr=34.91.134.19:12345}: librad::net::protocol::io::recv::gossip: gossip recv error err=Cbor(Decode(TypeMismatch(114, "expected array")))
kim commented 3 years ago

I suspect that the decoder somehow receives an EOF before the whole CBOR message is transmitted. Those type mismatches are basically random.

viraptor commented 3 years ago

:( Would it be worth adding a message length ahead of it, to tell a difference between "incomplete read" and "invalid message"?

kim commented 3 years ago

Ya, although that would be a breaking change. RFC 8949 explicitly says that back-to-back encoding should be preferred — it’s more of a limitation of Rust idioms which propagates the wrong error message here.

More interesting would be to find out how such a partial transmission can happen in the first place. It must be that the sender is actually shutting down, and discards outstanding packets. I’m aware that we should actually call Endpoint::wait_idle, but couldn’t find a good place to call it (it would need to be in main)

viraptor commented 3 years ago

It must be that the sender is actually shutting down

With the timing (few times in a second, then 9s break), is that likely a shutdown? Actually you can probably verify the restart in sprout logs.