ipfs / kubo

An IPFS implementation in Go
https://docs.ipfs.tech/how-to/command-line-quick-start/
Other
16.11k stars 3.01k forks source link

Video files buffer for extended periods of time, even while IPFS node shows outbound traffic #5740

Open MidnightLightning opened 5 years ago

MidnightLightning commented 5 years ago

Type:

Bug

Description:

Issue #4085 brought up the idea of being able to more effectively handle streaming/partial-reading of content fetched through the HTTP interface, and was marked as resolved due to the ifps cat command getting both an --offset (pull request #4538) and --length (merge) flags.

Currently the HTTP interface uses the built in Golang http.serveContent() method, which should be properly parsing Range HTTP headers and only returns the content in the range the client requested. However, when resolving an HTTP request to get file contents of a very large video file that the node has not already got cached, the amount of time spent loading the video from another node seems to be much higher than is needed based on the outbound traffic of the hosting node.

Tested using my own local IPFS node with a video file that I created but does not have a lot of traffic (typically my node has 100KB or less outbound traffic). When accessing that video file through the gateway.ipfs.io node or the cloudflare-ipfs.com node via a current Chrome browser, Chrome detects it's a video file and includes a Range header in the request to get the file data. On my local IPFS node's dashboard, I can see data start to flow out:

load1

But even after 30+ seconds of high data output, the video hasn't started playing yet. It seems something is either requesting the data out-of-order (the data flowing out of my local IPFS node is data blocks for later on in the movie, not the immediately-needed data blocks), or something else is causing delays in assembling/chunking the video data stream. My local IPFS node is connected via wired ethernet to my 300 MB fiber network connection, and shows it's connected to over 200 nodes in the IPFS network, so the network connection on this end shouldn't be the bottleneck.

Stebalien commented 5 years ago

I assume you're talking about gateway requests? This should already work. We use http.ServeContent but we pass in a special ReadSeeker implementation that should only fetch the blocks it needs.

Stebalien commented 5 years ago

So, I did a quick test and I can't reproduce this. If I remove a block from a file, I can still fetch the rest of the file from a gateway using range requests.

MidnightLightning commented 5 years ago

Thank you for looking into this @Stebalien! The test that you described I think is the inverse of what this request is aiming for: you tested with a file where the local IPFS node had the majority of the file already fetched and cached locally, and the HTTP client requested a part that was already fetched.

The situation I'm describing is the inverse: for example, assume the file in question is a 500 MB video file, that the local IPFS node has never seen before (so has no data for that video yet). If a range request comes in looking for 1 MB of data offset 100MB into the file, does only that range of data get fetched from the IPFS network and returned to the user? If so, I likely just have my request headers formatted improperly and this is working as intended.

MidnightLightning commented 5 years ago

Ah, part of this may be the way the Range headers are formatted: I'm doing tests in Chrome, and the way it loads the video stream is via a Range: bytes=0- header (an open-ended range request), which results in a chunked response (so pieces arrive when available), but the HTTP stream needs to load in order sequentially, but it seems the back-end fetches the data blocks from IPFS peers randomly within that byte range, where I think it would be helpful to prioritize blocks at the beginning of the byte range, to allow the HTTP stream to un-block.

Stebalien commented 5 years ago

The test that you described I think is the inverse of what this request is aiming for: you tested with a file where the local IPFS node had the majority of the file already fetched and cached locally, and the HTTP client requested a part that was already fetched.

I was just testing to make sure that range requests work.

but it seems the back-end fetches the data blocks from IPFS peers randomly within that byte range

It should fetch them sequentially within the range (mostly). Really, we start prefetching about 10 blocks ahead but it should still be pretty much sequential.

Do you have a test case I can try out?

MidnightLightning commented 5 years ago

The way I'm testing it to see that data's being requested is I have one browser tab open to my local IPFS node's main screen (looking at the incoming/outgoing bandwidth), and another tab to open up a video file that I have pinned, but accessing it through the ipfs.io gateway. When I do that, the "output" amount jumps up (~50-200 Kb/s), which seems to imply the data started getting fetched off it. But even after a minute, the browser tab trying to play the video is still on a "loading" screen, even though my node has been actively sending out data for all that time, which jumped out to me as odd, and that the node could probably do a better job of prioritizing certain file components.

Stebalien commented 5 years ago

Ah... So, 50-200 Kb/s is actually 5-25KiB/s (unless you meant KiB). We now pack up to 512KiB into each message so you may need to download 512KiB to start reading anything. That will take at least 20s on your connection.

Additionally, we prefetch ~10 nodes in parallel. Now, we do prioritize early nodes but, looking at the code, we need to get better about this. The sending side of bitswap has 8 "send" workers so your local node is likely trying to send all eight blocks at once (while it should be queuing them and sending them in-order).

So, really, it's likely that second part. Your local node is probably sending 2-4MiB all at once.

I've filed an issue here: https://github.com/ipfs/go-libipfs/issues/101


However, at the end of the day, I think trying to stream a video on an effective 25KiB connection isn't going to work well regardless.

MidnightLightning commented 5 years ago

Thanks for your analysis! I'm actually on a fiber connection (300 MB), so in theory it's not a bottleneck on my IPFS node's end. Might be the next hop in the chain has a slower connection, but my node should have the bandwidth to parallel send to many nodes at once, if they're all requesting.

Stebalien commented 5 years ago

It may just be our gateways being overloaded. Can you try against https://cloudflare-ipfs.com/ipfs/... to check that?

MidnightLightning commented 5 years ago

Okay, I gave a try using the Cloudflare node as well. And as a sample video, I used a video I uploaded from a local meetup's presentation on Bitcoin Cash: QmP5H8aXWZejmipPFZeP4Ys11WWohgEg14r6u7BCniAZzA.

When I load that file directly on my IPFS node in Chrome, it snaps open to first frames of the video immediately, and can start playing uninterrupted (which makes sense; I've got the video pinned, so it should be fully present in the node's datastore already).

Loading https://cloudflare-ipfs.com/ipfs/QmP5H8aXWZejmipPFZeP4Ys11WWohgEg14r6u7BCniAZzA took about 60 seconds before it seemed to find my node and started pulling data from it. I saw transfer speeds around 20 MB/s for almost a minute, while the tab with the video loading was still stuck on a "loading" animation (not even getting the first frames of the video to snap out to the right size):

load1

I let it go for a while and eventually it did start playing and loaded a fair bit of the video, but I noticed the network traffic from my node petered off, while in the other tab the video was still playing, and eventually the video stopped playing and went back into a "buffering" animation, with no network traffic showing on the node:

load2

I skipped about 5 minutes further into the video by sliding the playhead forward, which caused the network traffic to spike again on my node, but it again showed high data throughput for a minute with the video stuck in a "buffering" animation:

load3

That to me says that the IPFS nodes are communicating and moving the data blocks around, but not in the order that the browser needs to stream the video, leading to a worse user experience on the receiving end.

Stebalien commented 5 years ago

So, the petering off and restarting may just be the connection being cut (and then recreated when you restart the request by seeking). I really don't know what the delay is, buffering while caching?

IPFS will download the file (mostly) sequentially so I do know that's not the case.

MidnightLightning commented 5 years ago

🤔 Okay, so the initial title and request you say isn't the root cause (I'll modify that to reflect that), but this sort of experience with videos I get as an end user very frequently when browsing around D.Tube videos (both for other people and my own videos).

Are any other users seeing similar things when loading videos via IPFS? Is there more diagnosis that I can do from my end to see more debugging information (some way to see which block/byte ranges are among those 50+ megabits of data flowing out into the network?

Stebalien commented 5 years ago

Okay, so the initial title and request you say isn't the root cause (I'll modify that to reflect that), but this sort of experience with videos I get as an end user very frequently when browsing around D.Tube videos (both for other people and my own videos).

Yeah, I have seen this behavior although at the time I just thought IPFS was taking a while to find the content (not that it was somehow buffering it somewhere). Something fishy is definitely going on here, thanks for bringing this up and bearing with me as we debug it.

Out of curiosity, does this buffering issue happen when you view files uploaded by others though your gateway? That is, file gets uploaded to node A and you view it on your node B (via your gateway at node B). Do you see your wantlist changing (not just resorting) while the video refuses to play?

That would give us a clean test unaffected by (a) our slow gateways and (b) whatever cloudflare might stick in front of their gateways.

MidnightLightning commented 5 years ago

@Stebalien Okay, I've tried a few times, on a few days to view a file on my node from others. For obtaining those files, I'm looking at the D.Tube list of "Hot" and "Popular" videos (presuming that if they are "hot", they're likely also well-seeded), and generally I see these symptoms:

load1

No matter how many times I refresh the browser trying to load the video file, it doesn't load further. I tried it with another video with similar results:

load3

I tried again today with a smaller file-size video, and it loaded about 2/3 of the video before stopping and just sitting loading forever:

load10

Stopping the page loading, browsing to other files, and then coming back to it after a few minutes then did allow me to skip to the last bits of that video:

load12

So, not sure what to make of that. The user experience there makes me feel like the IPFS node loads part of something, and then freezes/pauses/chokes for some reason, and coming back to it later it can continue, if I reload. Reloading sooner than that doesn't do anything.

One symptom I was able to consistently nail down is that it seems the web daemon is only able to handle one incoming request at a time, or if it can handle multiple incoming requests, that a request for a large file sucks up any available connection options from the pool, starving out others.

To explain that, here's what I did. I took a "small file" as an example (the cat image at QmW2WQi7j6c7UgJTarActp7tDNikE4B2qXtFCfLPdsgaTQ/cat.jpg) and a "large file" (a three hour video that's currently "Hot" on D.Tube: QmXsoF7YauWwyZBvETx6j9A5D5P7u5Vj3AXJ2SuQB9BszX).

So, the loading of the "small file" for a second time seems to get held up behind the node trying to do something with the "large file". This is in Chrome, so might be related to this issue, where if the socket connections are not being closed, Chrome hangs when loading more content until a few minutes have passed?

rfielding commented 5 years ago

Ah, part of this may be the way the Range headers are formatted: I'm doing tests in Chrome, and the way it loads the video stream is via a Range: bytes=0- header (an open-ended range request), which results in a chunked response (so pieces arrive when available), but the HTTP stream needs to load in order sequentially, but it seems the back-end fetches the data blocks from IPFS peers randomly within that byte range, where I think it would be helpful to prioritize blocks at the beginning of the byte range, to allow the HTTP stream to un-block.

It is not uncommon to make a range request from some start byte to an end byte that we never intend to read. The server fulfills its end of the contract by either returning a 200 or a 206, and sending bytes as they are io.Read of the stream. It is extremely common with video for instance, to get ranges like "0-" or "512-43243243", with no intention of actually reading the entire thing. In the ideal case, the ONLY buffering is just enough to cover the timing delay/jitter in io.Read out of the stream. It is possible for a 4GB video file to get a request to seek 100MB in and ask for 5MB, but the client only ever reads 1MB. This is because video needs to be consumed at a real-time frame-rate, and pulling it out any faster either increases latency by buffering in the server, wastes bandwidth that could be shared (ie: don't pull at the bandwidth of the pipe - but at the frame rate demanded by the viewer - ie: you end up spending only 25% of the time pulling bytes when the pipe can handle 4x the demanded frame rate), or wastes memory in the viewer by buffering up the frames that have not yet been played out. In the ideal case, you can put a small upper bound on memory consumption per goroutine servicing http requests, and it's just the size of the io.Read that we get when reading bytes from the source to send back to the user. You only have to buffer more in practice, because sometimes you get a stall when trying to send what you have out to the client; and packets back up briefly.

In my case, I usually use the range request to seek for the first byte, and just don't set an end on how much to get back; because in the server side, I expect that client will close its io.ReadCloser when it has had enough. I even send the stream to the end when the client had only asked for a small range (and this is ok for video, but might not be ok for other situations where the client isn't smart about this. some clients might be dumb and buffer up an incoming 3GB when they only asked for 1MB, rather than closing the stream after reading for the amount it asked for - or just not buffering and keeping it open.) Since clients (usually) seek on a ciphertext boundary, it significantly simplifies dealing with situations where I need to seek into a cipher stream.

samsends commented 3 years ago

Hey just pinging to see if there is any news on this issue? Running into a similar experience with the same use case of video streaming / seeking different sections of the video.

MidnightLightning commented 3 years ago

Circling back to this issue myself: there's not been any technical strides made toward the IPFS/bitswap low-level protocol being able to do different things with large files (which audio and video files would benefit from) since this case was opened. However, in the intervening years, streaming video has become more and more popular, and different ways of dealing with it natively have arisen.

One technology that I think can help (at least as a stop-gap) in the IPFS world is HLS (HTTP Live Streaming) video. While some video codecs internally are streaming/seeking-friendly, the HLS protocol defines videos that are physically split into separate, smaller files (most examples have them about 6 seconds long), and having a master playlist file that then points to where all the pieces are. Re-encoding a video in that structure and hosting it on IPFS means there's no longer one huge file that the low-lying bitswap protocol needs to figure out how to prioritize, but a filesystem of files that HLS-aware players can fetch file-by-file. Similar to the steps I laid out as enhancement ideas for IPFS, an HLS video player first fetches the master playlist (a few bytes in size), parses it, and from that determines what other playlists exist for that video, and starts fetching them in order.

To test this, I followed the tutorial over here and sliced up the Sintel trailer into a HLS collection with two different resolutions. You can find that at ipfs://QmU3PxATnzAgSNaiBc5VsAHJKBNCHxDPRWPPV253rnh9Yq.

You can see it appears to IPFS as a folder of files, but if you just navigate to the master.m3u8 file it does not work (not being served with a proper MIME-type, and your browser would need to natively support HLS videos). So, I included a player.html file that is a simple website that embeds the video (plus an HLS-compatibility library). On a modern/evergreen browser, navigating to that player.html page and pressing "play" on the player seems to stream the video pretty smoothly! And (at least in the browsers I've tested), it properly adapts to network conditions and scales up the video to a higher-resolution when it detects it can fetch the pieces at a good speed from the server!

Also, I tried using VLC as a standalone player, and using it to open a "Network Stream" and pointing it at "https://gateway.ipfs.io/ipfs/QmU3PxATnzAgSNaiBc5VsAHJKBNCHxDPRWPPV253rnh9Yq/master.m3u8", worked perfectly to play the video as well!

Can others give that a try and see if that works more smoothly for your setups too?