ipfs / ipfs-webui

A frontend for an IPFS Kubo and IPFS Desktop
https://webui.ipfs.io
MIT License
1.53k stars 474 forks source link

Create custom HTML5 video player for IPFS files #920

Open MidnightLightning opened 5 years ago

MidnightLightning commented 5 years ago

In the web UI, if you navigate to a file's contents, currently the UI relies on the user's browser to render the contents. That works well for images and PDFS, and for audio and video files, most browsers now have some sort of player for those files. But especially for large video files, the user experience can be lacking, as the browser will attempt to fetch the video pieces in playback order, and expect them to be streamed from the IPFS node in that order. But if it's a file that the IPFS node doesn't have locally, and it needs to request the files from its peers, different peers may respond with different chunks of the video in different order. Relying on the browser's default player may block playback or saturate socket connections (e.g. https://github.com/ipfs/go-ipfs/issues/5740), making playback slower.

To make the best user experience for playing videos hosted on IPFS, using modern web APIs (Media Source Extensions (MSE)), a better player could be implemented, to be used in the default IPFS web UI, and made available for other sites to use/embed IPFS video content in their site, to make the user experience better when working with this sort of playback.

Proposal

A large file hosted on IPFS is loaded in a similar manner as a Torrent file is: it's split into chunks/blocks, and different peers may respond to give different chunks from different locations in the final file. Most torrent software give some sort of a UI element that visualizes what pieces of the raw data have been transferred:

Torrent UI

And some may give the option to prioritize blocks in the beginning of the file (if it's a video, to enable starting playing the beginning, while it's downloading). That sort of an indicator I think is key to helping users understand what's happening with their video playback.

The UI elements of a video player of a have evolved to have some common elements that users now expect. For example, in the YouTube player:

sintel_1

The current playback head is at 5:41 (visualized by a solid color bar), the amount of video that's currently loaded is shown as a faded out bar (to the right of the playhead), and in this screenshot, the user is mousing over a point in the future of the video (8:05; visualized by an even more faded out bar). Because the point the user is hovering over is past the "already loaded" point of the video, the user knows that if they click there, the video will need to buffer for a bit to queue up the next bit of the video to play.

Taking the idea of a torrent file progress bar, I propose a UI like this:

sintel_2

The situation is the same: the playback head is at 5:41, and the user is mousing over 8:05. In this UI, the "what parts of the video are loaded" is moved to a secondary track above the playhead, with loaded-and-ready sections colored in a different color from the playback track. With this visualization, the user can see that if they click there at 8:05, the player will need to buffer for a while (since that section is not loaded yet), but if they skipped forward a little further, there's content that's loaded a little ways forward that they could jump to.

Having a more rich front end widget for video playback would then allow for additional (optional) enhancements where the front end could make calls to the IPFS node to adjust block priorities based on user interaction ("content at the 8:05 mark and forward from there just got bumped up in priority, guys!"), rather than just passively waiting for the IPFS node to finish loading the content.

Rationale

Sites like BitTube and D.tube have started to spring up, aiming to make a rich video browsing experience, and both have created custom player implementations to try and load IPFS videos in the most effective way for users. Setting up a standard UI and toolset for application developers to utilize could streamline that process, and increase adoption of IPFS in general. It could also help with debugging as it would bring to the forefront more information about what is happening to the video, instead of leaving a user stuck at a loading animation (and possibly navigating away out of frustration).

olizilla commented 5 years ago

Thank you @MidnightLightning for taking the time to write this up so thoroughly and offering up a proposal for how it could work. This also touches on the more general need for a really good loading UI for IPFS that is aware of how bitwswap works, and the graph nature of the blocks and links between them.

I'm in favour of adding a video player that pushes things forward, but I won't get time to work on it for a few months. Would you be interested in creating a proof-of-concept for this idea?

patrykadas commented 5 years ago

That's a cool idea and good reasoning @MidnightLightning. I also agree with @olizilla that it might be expanded a bit further.

I feel like loading could provide important feedback for the user about the data provenance. Just by looking at the page it could be told that a certain data point is actually stored using IPFS or a centralized source.

loading

I’ve recreated the common pattern of a ‘skeleton loading’. Movement from left to right (e.g. a wave or shimmer like animation, much like Facebook or Google uses) are perceived as shorter in duration than skeletons that pulse (opacity fading in and out).

This proposal is by no means the engineered solution, rather an exploration of utilizing a loading time as something that works in IPFS favor and educates the user.

MidnightLightning commented 5 years ago

@patrykadas While that's a pretty animation, current image formats wouldn't be able to load like that (vertical bands) if fetched piecemeal, in random order, via bitswap/IPFS communications. Established image formats like JPEG evolved to have a "can show a lower-resolution version while loading" option, but that is based on the binary data of the JPEG loading in-order. Legacy image and video formats are designed to be saved as one binary file, with the structure of the data in the binary file being order-dependent. However, more modern trends have created file formats where an individual "file" is really a folder of files with different contents (possibly zipped to make them more space-efficient: Mac application bundles, Java JAR files, modern Office file formats (e.g. XLSX, DOCX)). And the way that online media services have evolved is they create multiple versions of an image/video, to serve up for different users (video sites like YouTube save the video file in multiple resolutions, such that the player can switch between resolutions if network congestion occurs), but you could conceptually think of all the resolutions of the image/video as one "unit" representation of the media. Since IPFS does have a concept of folders, there could also be new "media format" standards that emerge of how to lay out files in an IPFS folder, such that a player/viewer can reassemble them.

For example, for viewing huge images (gigapixel), there are viewers that take the image and slice it up into different tiles, at different zoom levels, apply a naming convention so it knows how to find them, and then give a pan/zoom interface to view it. So, to make your "horizontal bands" type load for a specific type of image, you could slice the image up into a couple dozen vertical bands (save each one off as a separate PNG/JPG? Or investigate the newer WebP format, since that has the ability to target a specific file size with the conversion, and force it to be exactly the size of bitswap base-layer blocks?) and put them in a folder with some sort of metadata file (JSON/YML file that defines the image size and how many tiles there are). Then the IPFS bitswap protocol could prioritize fetching the metadata file first, then hands that file off to a player/viewer, who then takes that plus browser (dimensions, scroll position) and user (mouse interaction) input to prioritize which tiles to fetch. As they stream in from the bitswap peers (in whatever order they arrive in), they could get filled into place.

@olizilla I'd be happy to help out on this where I can; I've experience as a front-end developer, so could create the front-end part of it, but I haven't spent a lot of time figuring out the low-level bitswap infrastructure of IPFS, so wouldn't be as helpful in creating the needed API points (query for "what data is already loaded for data in these ranges?", and "prioritize this byte range, and let me know when any points of it come back in"). It actually seems like it might be a good use case for websockets rather than standard request/response calls, since then the client could just sit waiting and listening for events to come in from the node the moment any bit of data streams in?

patrykadas commented 5 years ago

@MidnightLightning That's a great explanation, thank you! My intention was never anything more than a pretty animation that could lead to a broader discussion about visual clues and new metaphors for, in this case, 'skeleton loading for IPFS'.

olizilla commented 5 years ago

@patrykadas thank you for taking the time to show your idea so clearly! I like to demo the chunking that goes on behind the scenes in IPFS by showing people just the first chunk of a large jpeg from the Apollo space missions loading over IPFS

https://ipfs.io/ipfs/QmT2otBVhMXDGx7CysgoqKKZH8nbmeDPu88gGNVBZ1kEAj is a link to the first chunk of a spaceship... while https://ipfs.io/ipfs/QmZHMYyZRMHV4ZWXucLinHVSREHbC8tUruEoR9rLwQD1Bp is the whole thing. Your visualisation is much clearer.

I'd really like to explore what a "graph aware" loading animation would look like. With IPFS you typically fetch a single root block first, which contains links to the next level of blocks, each of which in turn may link to deeper layers, fanning out level by level until you discover and fetch all the "leaf" blocks that contain the actual chunks of data that make up the image or file. I'd love to talk about it some more if you have time.

@MidnightLightning ...and thank you for explaining things so clearly! I will find some useful links and write up notes on how we might tackle it here when I get a moment. (that may be few days! that post holidays todo pile doesn't seem to be getting any smaller)

patrykadas commented 5 years ago

@olizilla That sounds exciting! I think loading can efficiently show the data provenance not only through videos and various media, but also skeleton patterns and loaders. In this sense simple progress bar can load in a different manner (not from left to right) or be somehow differently represented, as the progression might not be linear.

I'd also like to explore if this could convey the information about health of the file and explain the extended wait time if needed. Might be also interesting to show that the container 'accumulates' the data, like in this small example: loading12-2

I have time and would love to talk more about it!

MidnightLightning commented 5 years ago

Thinking more about this idea of "showing status of in-progress loading from p2p network", which seems to be the prerequisite before "better viewer for images/video", I think this is the current state of the issue:

Given: Large images and videos are likely to go over the chunk size within IPFS, so will be split into a DAG, represented by one base CID that then links out to many other child objects. Then how should clients effectively fetch them:

Add endpoint for "is this block available locally?"

Similar to a HEAD request in the HTTP infrastructure, having the means to not just GET the contents of a given block/object in IPFS, the ability to query "if I make a request for this block from you, would you be able to serve it immediately?" without actually kicking off a "go find this block's contents from the P2P network" call would be useful. This could either be a simple yes/no boolean response, or could be worked into a more complex "stats" response (which could give information about how many peers I know who have that block ("seeders"), how many other peers I know are looking for that block ("leechers"), and other metadata).

The base CID has to happen first

Without it, the node cannot know how big the resulting file is, nor the CIDs of any of the child objects. From the node that's needing the file, no real prioritization is needed since the receiving node doesn't know any other CIDs related to the object yet. From the seeding node's perspective, among all the "want" requests it gets in, requests for these base CIDs could be prioritized higher than other responses (both because these blocks will likely be smaller (since their payload is a list of links), and they can know it's blocking that node from doing any further processing). Having a way to differentiate blocks that are "base CIDs" from "data CIDs" in the bitswap protocol would help nodes prioritize these. This step in the process seems a lot like a "magnet link" step in a torrent transfer; the receiving end cannot do anything until some other node gives them a copy of that root data definition. This block should also be the most easily-obtained, since every node that even has part of the file would have to have this base block too.

Get "Magic Bytes" next

With most file formats, the beginning of the binary data of the file has some sort of header or "magic bytes" flag that indicates the format of the file. So once the list of blocks for a given binary file is enumerated by the receiving node, getting the first block of data (first 256k of the file) is probably the next priority. So, having a way for receiving nodes to set priority on block "wants" (and then adding logic for them to set the first block as high-priority automatically), when communicating with peers would allow this sort of prioritization.

Pause there, for a "light" load

With the base CID and first child block fetched, the receiving node now can figure out most of what the file is (format from the magic bytes, filesize from the base CID, all the child CIDs associated with that file), which is enough for a file browser display or similar. For some use-cases this may be all the client needs for the moment, so the /object/get API endpoint should have an additional argument to fetch just this much of the object and not the entirety of the binary file data.

Add endpoint for "fetch, but don't block"

Once an app knows that it does want the binary contents of the file, it needs a way to signal to the IPFS server "start downloading this file", and the server will respond with a positive response (without the contents of the file in the response body) if it has kicked off downloading the rest of that file in the background (as opposed to a straight GET of an /ipfs/Qm... URL, where the response will block until the server has the whole file (or at least the beginning part of it in a chunked/streamed context) and will respond with the binary data in the response body).

Add a websocket connection to listen for block updates

Once a "background download" has kicked off, the UI needs the means to check on the process of the job and get the results. A websocket connection is ideal for this sort of server-initiated event system. The UI client can connect to the websocket and listen for updates. The updates sent down the websocket connection would be very small, just giving the CID and an action flag (e.g. "loaded", or "cancelled", or "failed" or whatnot). When the UI client saw an update it cared about (probably a "loaded" notice), it could then fire off a request to /object/get for that CID and know that the node should respond quickly, since that data should be fully-loaded now.


How does that look for a starting point? What other things would need to be defined in this plan to make it most useful?

MidnightLightning commented 4 years ago

Circling back on this idea: there seems to be not a whole lot of other ideas being added to this concept I laid out, and a few "thumbs up" for it. If we like this plan, what is the next steps to make this actually happen? I laid out six concepts that could each be a feature developed separately; should I create separate issues for each of those, to have a chance of starting to chip away at this?