Closed raykyri closed 1 month ago
Manually patching the autodialer retry threshold solved most of our problems. Someone else caught this last week and a fix is already on main: https://github.com/libp2p/js-libp2p/commit/767b23e710b1a9b545421365f2f9603c37cbec78. (We're mostly running with gossipsub penalties off, so no issues there. Tuning how many peers are grafted helped only marginally.)
Still encountering occasional SIGILL crashes, which were propagating up our stack and causing issues with our container host, but it may not be a js-libp2p issue but something lower level (filecoin-lotus users are seeing it too??) so I'll close this issue now. If anyone else reads this while testing their mesh: invest in headless browser network tests using something like docker-compose -- it's not as hard as it sounds and worth it!!
@raykyri some of the browser interop work is being covered with a demo app at https://github.com/libp2p/universal-connectivity. Browser reliability should increase when webrtc is released in go-libp2p. see https://github.com/libp2p/go-libp2p/issues/2778
@raykyri Are you still encountering SIGILL crashes?
Are you using js-libp2p for the server host on Node.js?
We're using js-libp2p for the server host, yep.
We haven't seen any SIGILL issues, we eventually traced that to somewhere else.
We've been running a browser-to-server libp2p mesh for chat applications at https://play.skystrife.xyz, that uses gossipsub to distribute messages and our own service, based on GossipLog and a Prolly tree to sync past messages. We're monitoring logs, Prometheus metrics, and have separate instances that spin up libp2p nodes and connect to our mesh to perform health checks.
Since last week, there have been tens of players online at the same time (occasionally even 100+). We've noticed reliability issues even at the smaller scales - libp2p server nodes will randomly stop accepting messages, or stop listening on the port after a few hours. The cause isn't an OOM or anything else readily apparent from
libp2p:*:error
logging.What's the state of reliability for browser-to-server libp2p right now? We're considering using a separate websocket service and using libp2p for server-to-server sync exclusively as it seems unclear how others have deployed this stack in a browser environment.
We are currently on: