libp2p / js-libp2p

The JavaScript Implementation of libp2p networking stack.
https://libp2p.io
Other
2.27k stars 436 forks source link

Browser-to-server libp2p reliability #2529

Closed raykyri closed 1 month ago

raykyri commented 2 months ago

We've been running a browser-to-server libp2p mesh for chat applications at https://play.skystrife.xyz, that uses gossipsub to distribute messages and our own service, based on GossipLog and a Prolly tree to sync past messages. We're monitoring logs, Prometheus metrics, and have separate instances that spin up libp2p nodes and connect to our mesh to perform health checks.

Since last week, there have been tens of players online at the same time (occasionally even 100+). We've noticed reliability issues even at the smaller scales - libp2p server nodes will randomly stop accepting messages, or stop listening on the port after a few hours. The cause isn't an OOM or anything else readily apparent from libp2p:*:error logging.

image

What's the state of reliability for browser-to-server libp2p right now? We're considering using a separate websocket service and using libp2p for server-to-server sync exclusively as it seems unclear how others have deployed this stack in a browser environment.

We are currently on:

    "@libp2p/bootstrap": "^10.0.7",
    "@libp2p/fetch": "^1.0.5",
    "@libp2p/identify": "^1.0.6",
    "@libp2p/interface": "^1.0.2",
    "@libp2p/logger": "^4.0.2",
    "@libp2p/mplex": "^10.0.7",
    "@libp2p/peer-id": "^4.0.2",
    "@libp2p/peer-id-factory": "^4.0.1",
    "@libp2p/ping": "^1.0.6",
    "@libp2p/prometheus-metrics": "^3.0.7",
    "@libp2p/utils": "^5.0.3",
    "@libp2p/websockets": "^8.0.7",
abuvanth commented 2 months ago

Check this https://github.com/silkroadnomad/libp2p-relay/issues/3

raykyri commented 1 month ago

Manually patching the autodialer retry threshold solved most of our problems. Someone else caught this last week and a fix is already on main: https://github.com/libp2p/js-libp2p/commit/767b23e710b1a9b545421365f2f9603c37cbec78. (We're mostly running with gossipsub penalties off, so no issues there. Tuning how many peers are grafted helped only marginally.)

Still encountering occasional SIGILL crashes, which were propagating up our stack and causing issues with our container host, but it may not be a js-libp2p issue but something lower level (filecoin-lotus users are seeing it too??) so I'll close this issue now. If anyone else reads this while testing their mesh: invest in headless browser network tests using something like docker-compose -- it's not as hard as it sounds and worth it!!

SgtPooki commented 1 month ago

@raykyri some of the browser interop work is being covered with a demo app at https://github.com/libp2p/universal-connectivity. Browser reliability should increase when webrtc is released in go-libp2p. see https://github.com/libp2p/go-libp2p/issues/2778

2color commented 1 month ago

@raykyri Are you still encountering SIGILL crashes?

Are you using js-libp2p for the server host on Node.js?

raykyri commented 1 month ago

We're using js-libp2p for the server host, yep.

We haven't seen any SIGILL issues, we eventually traced that to somewhere else.