ipfs / js-ipfs

IPFS implementation in JavaScript
https://js.ipfs.tech
Other
7.44k stars 1.25k forks source link

Stable transports in the browser #1088

Closed Beanow closed 1 year ago

Beanow commented 6 years ago

Type: Bug / Enhancement

Severity: High

Description:

There's a good amount of pitfalls with the js-ipfs transports in the browser today. Bundled with 0.26.0 are:

libp2p-webrtc-star

As in https://github.com/ipfs/js-ipfs/issues/950, today libp2p-webrtc-star is crashing the browser.

image image

Possible solutions:

daviddias commented 6 years ago

Thank you for opening this issue, @Beanow :)

I believe we can have an interim solution with libp2p-websocket-star https://github.com/libp2p/js-libp2p/pull/122#discussion_r151846099

One other note I want to add to this thread is that PubSub itself does relay, so in practice you can have a hosted node that subscribes to the same PubSub channel and that node will relay messages for you.

alvestrand commented 6 years ago

1) How many PeerConnections did you create? 2) Did you get anything useful done with them, or did you just create and park them? I'm currently debugging an issue that seems to be number-of-threads related.

mitra42 commented 6 years ago

@alvestrand , I'm not sure who the question was addressed to, but we see this without explicitly creating any connections (just creating and starting IPFS), all the decisions on how many connections to open, and how many threads is happening in lower layers.

@diasdavid - could you post HERE, what is needed for the workaround as https://github.com/libp2p/js-libp2p/pull/122#discussion_r151846099 seems to have just configurations people are reporting problems with. In particular - need to know what interim configuration to use, anything in package.json (e.g. non-default versions). I'll be happy to test our apps against it.

nils-ohlmeier commented 6 years ago

@alvestrand when opening the demo URL from issue #950 I see Firefox opening lots and lots of PeerConnections with DataChannels in them. It also appears to close PCs, but the number of open PCs keep going up. I'm tempted to put in an upper boundary of PeerConnection's per domain to prevent pages from crashing Firefox by exhausting memory and/or threads.

@mitra42 what I don't quite understand is that this is IPFS project, right? But in here and in issue #950 you state that you don't open connections. Which JS library in your stack opens the PeerConnections? Who ever maintains that library needs to understand that PeerConnections are not as cheap to create as for example TCP connections. So who ever manages the PeerConnections needs keep a balance.

mitra42 commented 6 years ago

Hi Nils, There are lots of links in #950, so I'm not sure which "demo URL" you refer to. I don't know about Orbit, if that is what you mean, but I'm refering to our demos (on https://dweb.me/examples).

I said we don't EXPLICITLY open connections, we, and I believe the many others reporting bugs like this, are loading IPFS, using the recommended config for using pubsub which is required by Yjs which is required to implement append-only-logs. ipfs.start() and wait ~5-15 mins and it crashes.

The management of connections is done by the underlying IPFS libraries. I understand that @diasdavid thinks he knows how to fix it, and that is happening (I'm currently not sure if that is a fix to WebRTC-star, or the implementation of websockets-start).

{ repo: ..., config: Addresses: { Swarm: [ '/dns4/star-signal.cloud.ipfs.team/wss/p2p-webrtc-star']}, EXPERIMENTAL: { pubsub: true }}
nils-ohlmeier commented 6 years ago

So I just opened https://dweb.me/examples/example_list.html in Firefox 59 and let it sit for several minutes without doing anything. After maybe 15min or so Firefox had opened 407 PeerConnections, of which 109 got closed.

I noticed that quite a few of the not closed PeerConnections never got connected to anything. From Firefox perspective these connections never received an SDP answer for the offers Firefox created. I think that is something which should get optimized in the IPFS stack to more aggressively close PeerConnections which failed to connect to anything.

I also noticed that each of these PeerConnections starts two threads in Firefox, which seems unnecessary. I'll file a bug on the Firefox side for that. Maybe we can optimize this a little bit more.

One other observation from my side: the tab in Firefox never crashed for me. Now I did this test on a MacBook Pro with 16GB of RAM. Quite possible that on a less power full machine the 500+ threads Firefox was running at the time results in actual problems.

Beanow commented 6 years ago

@nils-ohlmeier wow nice findings! Could you share a quick primer of how you analysed this? To reproduce and test for improvements.

Same as @mitra42 no connections were declared by me. It's part of the peer discovery process and pubsub stack these connections grow.

From what I know there is a dialer machine in the swarm component that tries to connect with every address that comes up in the peer discovery.

Beanow commented 6 years ago

As for crashes I ran on a 32GB RAM machine and don't think any OOM was triggered by the OS. Perhaps they are some sandboxing hard limit? As in FF Quantum and Chromium it's only crashing the tab.

mitra42 commented 6 years ago

Great - I see different behavior on Firefox and on Chrome

nils-ohlmeier commented 6 years ago

@Beanow well I work on WebRTC in Firefox for a living, so I should know how to do this ;-)

The easiest thing is to open "about:webrtc" in another tab. That will show you all the WebRTC PeerConnections Firefox currently has open, plus old closed ones. I did then save that page and ran you favorite unix command line tools to do the counting of the connections on that page.

As for the threads I created a bug in the Firefox bug tracker with a patch which should improve the thread problem: https://bugzilla.mozilla.org/show_bug.cgi?id=1421819

But the general problem remains that lots of PeerConnections get created, which eventually get the browser into trouble.

daviddias commented 6 years ago

@nils-ohlmeier thank you so much for joining this thread and providing such valuable analysis! This is rad :D

We are working on a ConnManager that gauges the usage and number of connections open and tries to preemptively close some before the browser (or Node.js) starts panicking. Is there any API that we can use that can give us a better understanding of the current load rather than using simple heuristics?

nils-ohlmeier commented 6 years ago

@diasdavid you are not the only asking for load information. But the normal use case so far was pages/services which do video calling. So we are contemplating to expose information about the speed of video encoding. But that would not help in your case as you only do data channels, with no video. So unfortunately the answer AFAIK is no.

But as a first step instead of re-acting on load I would recommend to look into quicker (?) closing the PeerConnections which never connected anywhere. Something appears to close PeerConnection already. Maybe you "just" need to change a timeout value in one of libs providing you WebRTC?

Beanow commented 6 years ago

As a short update. libp2p-websocket-star tested in 0.27.5 using the FAQ entry's config holds up well stability wise. My stress tests were not able to crash the browser and files are coming through.

That isn't to say performance is where it needs to be, it will at least stay running. Some observations that stood out to me:

daviddias commented 6 years ago

After seeing the magic @ya7ya did with WebRTC on paratii, I now know that there is a way to pipe tons of data through WebRTC without having it consume tons of memory both in Firefox and Chrome.

@ya7ya could you outline the patches/changes you did or help lead the way by submitting the PRs needed? This is super high priority and of high importance :)

ya7ya commented 6 years ago

Hey @diasdavid , Sorry for the late reply 😄

I'm not so sure if my fixes to the paratii fork solved the underlying problem. it's important to mention that paratii would crash if we ran 5 or 6 embedded players in 1 page. but it has web3 + ipfs + clappr bundled. and it's mainly the web3 fault if i remember correctly.

but the main change is limiting the MAX_MESSAGE_SIZE in js-ipfs-bitswap to around 32kb instead of 512kb. 32kb isn't an exact science value. some browsers (firefox) won't work with higher values, like 64kb but chrome does fine.

This however was the wrong place to do the edit according to this comment , I did subsequently try to do the block-stream limit in the js-libp2p-webrtc-star , but it broke , and when i fixed it, it still didn't fix the main issue.

As far as i can tell. limiting the max message size resulted in a more stable connection, meaning the dialer isn't going crazy attempting to connect to the peers that keep disconnecting.

What i would recommend is the following:

mkg20001 commented 6 years ago

we probably should drop socket.io for something thinner and lighter. maybe npmjs.com/package/uws

Or just drop the extra websocket server entirely: https://github.com/libp2p/js-libp2p-webrtc-star/pull/148

nils-ohlmeier commented 6 years ago

FYI https://lgrahl.de/articles/demystifying-webrtc-dc-size-limit.html explains the old limits on maximum messages which could be send over data channels. But newer versions of Firefox now support way bigger messages https://blog.mozilla.org/webrtc/large-data-channel-messages/

That doesn't mean that sending large amount of data over data channels might cause stability problems. Just thought I point this out in case you are not aware of these informations already.

lgrahl commented 6 years ago

Do you guys make use of the (quirky) flow control there is for data channels (namely bufferedAmountLowThreshold, onbufferedamountlow and bufferedAmount)? If not, it might be that you're buffering too much data at once. I've written an example a while ago how this can be used.

I haven't looked at your code yet but feel free to ping me for general data channel questions in Mozilla's IRC #media or on freenode #webrtc.

daviddias commented 6 years ago

No more "already piped" with 0.31 - https://github.com/ipfs/js-ipfs/issues/1458

interfect commented 6 years ago

I just checked on libp2p-webrtc-star today, with js-ipfs 6d960f322ce30b3092bc539368f1257a5213b6ef.

Putting '/dns4/wrtc-star.discovery.libp2p.io/tcp/443/wss/p2p-webrtc-star' in as a Swarm (alongside '/dns4/ws-star.discovery.libp2p.io/tcp/443/wss/p2p-websocket-star') led to quick file transfers between computers in my LAN, but also led to Firefox (after a few minutes) no longer being able to connect to web sites. I would open a new tab and have it sit at "Connecting" forever, and Gmail would complain that it was disconnected from chat and be unable to save messages being edited. Either a tab crash (which would happen after trying to open a few pages on Linux) or killing and re-launching the browser (on Mac) seemed to bring things back to normal.

Also, on Linux, running a node with webrtc would also sometimes eventually make the browser's window stop drawing its contents, including the tab bar.

The webrtc transport is still not stable, and moreover somehow seems able to achieve by accident the sort of browser-ruining behavior that malicious hackers everywhere struggle to produce.

lidel commented 6 years ago

@interfect did you try running it along with libp2p connection manager? Limiting the number of connections may improve stability of webrtc.

Click to expand a sample config ```json { "config": { "Addresses": { "Swarm": ["/dns4/star-signal.cloud.ipfs.team/tcp/443/wss/p2p-webrtc-star","/dns4/ws-star.discovery.libp2p.io/tcp/443/wss/p2p-websocket-star"], "Bootstrap": [] } }, "connectionManager": { "maxPeers": 20 } } ```
interfect commented 6 years ago

@lidel I've given connectionManager.maxPeers a shot (although I didn't include "Bootstrap": [] in my config), and the browser break/crash issues persist. It doesn't really seem to have much of an effect; I think I may get my node reporting fewer peers, but I'm still seeing loads and loads of attempted WebRTC connections in about:webrtc, and loads and loads of UDP connections at the router:

image

(Where it spikes up is where I open the tab with js-ipfs in it.)

A lot of these connections are apparently to the same endpoints:

image

I'm not sure what's at pf-in-f127.1e100.net or that EC2 address; maybe IPFS bootstrap nodes or some kind of WebRTC relay thing? (EDIT: The Google one looks like it might be Google's STUN server.)

I think the connection manager might only be counting and limiting fully-open connections, as @Stebalien mentions go-ipfs does in https://github.com/ipfs/go-ipfs/issues/5248#issuecomment-405923696. It seems perfectly happy to try to open 300 connections in 30 seconds, even if it is maintaining no more than 20 properly connected peers.

kevinsimper commented 6 years ago

As per @lidel from IRC is am posting about a demo app I made with PubSub and websocket-star.

I made this PubSub game and I use the websocket-start with IPFS. Yesterday I tried a demo at a meetup but the demo failed when more than 20 people tried to use it. How can I scale up my application, the problem was that the websocket connection was dropping, so I was only connected to at max 12 and some could reach 21-22 but we were 70 in total.

A demo is here: https://p2p-ipfs-presentation.surge.sh/game/ you can view-source and it is also on github here https://github.com/kevinsimper/p2p-ipfs-presentation/blob/master/game/index.html

I tried webrtc before a couple of months ago, but it would often crash my browser after 5 minutes

SgtPooki commented 1 year ago

js-ipfs is being deprecated in favor of Helia. You can https://github.com/ipfs/js-ipfs/issues/4336 and read the migration guide.

Please feel to reopen with any comments by 2023-06-02. We will do a final pass on reopened issues afterward (see https://github.com/ipfs/js-ipfs/issues/4336).

This issue is most likely resolved in Helia (and the latest libp2p), please try it out!

Followers/subscribers on this issue should investigate libp2p team's (and many other contributors') universal-connectivity app (see https://github.com/libp2p/universal-connectivity), where a lot of these problems are shown to have been ironed out.

@kevinsimper it would be really cool to migrate your game to Helia and add to ipfs-examples/helia-examples see https://github.com/ipfs/helia/issues/43 for some of the examples we're already planning on porting.