nicknsy / jellyscrub

Smooth mouse-over video scrubbing previews for Jellyfin.
MIT License
668 stars 27 forks source link

Possible upstreaming and format alternative #83

Closed iwalton3 closed 3 months ago

iwalton3 commented 1 year ago

As you know I recently implemented support for this into both of my desktop clients as an experimental feature. I've been pretty happy with it myself and brought up the possibility of making preview image generation a Jellyfin server feature internally. This would be particularly nice since all clients could eventually adopt it.

One point of concern was that Roku BIF is not really a vendor neutral format and Roku isn't even recommending it anymore as the primary trickplay format. Upon reviewing alternatives, both HLS and DASH use tiled images within a series of JPEG files. Roku and Shaka player supports this natively and YouTube also appears to be using some variant of this.

I made a prototype fork of this plugin that follows this scheme here and generates HLS playlists that some clients might be able to support "for free": https://github.com/iwalton3/jellyscrub/releases/tag/v2.0.0.0

What are your thoughts on this?

Note: You can read more about Roku's recommendations concerning BIF and HLS/DASH standard thumbnails here: https://developer.roku.com/docs/developer-program/media-playback/trick-mode/hls-and-dash.md

nicknsy commented 1 year ago

This is awesome! I'd love for this to be an internal feature but was just waiting for discussion on the meta post here: https://github.com/jellyfin/jellyfin-meta/discussions/33. It didn't receive any replies for a couple of months so I just left it at that, and at this point I'm still not sure how to coordinate with everyone on upstream implementation.

Format isn't a big deal for me personally--as reflected by nielsvanvelzen's recent response on the meta post, I think regardless of the format most clients will require custom implementation. AFAIK the EXIF-TILES is also a format created by Roku so like you said only Roku (& Shaka--does Jellyfin use Shaka? I thought it used Hls.js) natively supports it ATM, which is also the only client that natively supports BIF.

Still, I understand the appeal of using a newer technology that is a real drafted standard instead of a nebulous format that only really exists on Roku's website. Plus, as more players get support for hls tiles it could mean easier native implementation, like with Shaka.

That is all to say that I really love this, but as far as integrating this into Jellyfin you're going to have much more insight than I do. I think your implementation with EXT-X-TILES is definitely the way to go, and I would be happy to help rework it to be internal server code, but really that's the easy part--it's not that big of a plugin--I'm more unsure of things like how a new feature like this would fit into the 10.9 release cycle, etc.

iwalton3 commented 1 year ago

Jellyfin does not use Shaka. It's more I was looking at the ecosystem and anywhere I saw the format used was effectively a "vote" in favor of the format. Jellyfin uses HLS.js.

For web, JMP, and MPV Shim I would still just use the manifest and process the thumbnails myself. The forked plugin I linked doesn't actually use the HLS playlist to function in the web app, just the tiles.

What is nice though is that both the HLS and DASH standards use tiles, so only one physical storage medium would be needed regardless of future interest in either metadata format. (Even if many clients just use the manifest directly to fetch the tiles.)

nicknsy commented 1 year ago

It looks like jellyfin-web might get Shaka player support in the near future: https://github.com/jellyfin/jellyfin-web/pull/4041

nicknsy commented 1 year ago

Looking into migrating the code to jellyfin internal and was doing some testing with your new version. Everything works great, except it looks like when multiple target width resolutions are set in the config each generation overwrites the previous, leaving only a single resolution in the manifest.

Since I'm migrating & changing the structure of the project anyways, you don't need to make a fix, but I'll post any related issues on this thread to keep track of anything that needs changing.

iwalton3 commented 1 year ago

Here is a replacement JMP plugin that would replace jellyscrubPlugin.js that uses the new format. It is probably closer to what would eventually end up in jellyfin-web as well. https://gist.github.com/iwalton3/75d788d79b8b66a16a42281c65204e71

I should also note I found that the line in the original trickplay.js I updated:

customThumbImg.src = imageSrc;

works better if replaced with:

if (customThumbImg.src != imageSrc) customThumbImg.src = imageSrc;

I saw that JMP was reloading the image every time the frame updated. (Although this could have been because I had the cache disabled in devtools...)

iwalton3 commented 1 year ago

I added test support for the tiled images to MPV Shim as well. It looks like grabbing and decoding the tiles is somehow faster too despite a larger file size. (I would figure the larger file size was because of what I set the jpeg quality setting to.)

2023-02-21 00:33:40,836 [    INFO] trickplay: Collecting trickplay images...
2023-02-21 00:33:40,844 [   DEBUG] urllib3: http://host "GET /Trickplay/id/GetManifest HTTP/1.1" 200 None
2023-02-21 00:33:40,849 [   DEBUG] urllib3: http://host "GET /Trickplay/id/320/1.jpg HTTP/1.1" 200 1567807
2023-02-21 00:33:40,981 [   DEBUG] urllib3: http://host "GET /Trickplay/id/320/2.jpg HTTP/1.1" 200 1635772
2023-02-21 00:33:41,060 [   DEBUG] urllib3: http://host "GET /Trickplay/id/320/3.jpg HTTP/1.1" 200 1588771
2023-02-21 00:33:41,131 [   DEBUG] urllib3: http://host "GET /Trickplay/id/320/4.jpg HTTP/1.1" 200 1332816
2023-02-21 00:33:41,200 [   DEBUG] urllib3: http://host "GET /Trickplay/id/320/5.jpg HTTP/1.1" 200 1320427
2023-02-21 00:33:41,269 [   DEBUG] urllib3: http://host "GET /Trickplay/id/320/6.jpg HTTP/1.1" 200 1366811
2023-02-21 00:33:41,339 [   DEBUG] urllib3: http://host "GET /Trickplay/id/320/7.jpg HTTP/1.1" 200 330059
2023-02-21 00:33:41,383 [    INFO] trickplay: Collected 639 bif preview images
2023-02-21 00:33:41,383 [    INFO] mpv: trickplay: Received BIF data. 
2023-02-21 00:35:02,295 [    INFO] trickplay: Collecting trickplay images...
2023-02-21 00:35:02,306 [   DEBUG] urllib3: https://host "GET /Trickplay/id/GetManifest HTTP/1.1" 200 46
2023-02-21 00:35:02,316 [   DEBUG] urllib3: https://host "GET /Trickplay/id/320/GetBIF HTTP/1.1" 200 6706194
2023-02-21 00:35:03,011 [    INFO] trickplay: Collected 640 bif preview images
2023-02-21 00:35:03,011 [    INFO] mpv: trickplay: Received BIF data. 
iwalton3 commented 1 year ago

For adding the HLS playlists to the Jellyfin transcoder/remixing response, it would look like this:

#EXT-X-IMAGE-STREAM-INF:BANDWIDTH=32920,RESOLUTION=640x360,CODECS="
jpeg",URI="playlist.m3u8"

A notable caveat with both the line above and the HLS playlists is that they both currently require authentication via the API to fetch. Solutions include embedding the API key, URL signing, or just not authenticating the request. (I believe Jellyfin still does the last option for most images.)

Bandwidth values used by the Roku examples:

16460 for 320x180
32920 for 640x360

Extrapolated formula:
sqrt(length * height) * 4115 / 6 / interval
nicknsy commented 1 year ago

Awesome that you managed to extrapolate the formula for bitrate. Earlier I looked at https://github.com/rokudev/samples/blob/master/media/TrickPlayThumbnailsHLS/scripts/gen_playlist.sh and realized that they ended up just hardcoding it in their scripts.

What I'm planning on doing for now is replicating how chapter images are handled so as to not implement too many trickplay-specific methods/endpoints. What that's hopefully going to look like is this:

For instance: Trickplay: {"730b063b5b394d8db199a4124a8f40d1": ["320x180", "640x360"], "6d8e2877565444ff9bad722f1a406f34": ["320x180"]}

iwalton3 commented 1 year ago

If the user changes the interval, do you think we should regenerate the images?

I think it could be helpful to store the parameters into the database that I put into the TileManifest. Some reasons I thought for this:

Of course at the same time the m3u8 playlists are not that complicated. But they are more annoying to deal with than the generation parameters in my mind.

nicknsy commented 1 year ago

That's a good point. With the same information given in the Items info response no request to the playlist is necessary. Hls is pretty simple but DASH uses XML which seems like much more of a pain--and then there would theoretically need to be an Hls and DASH parser for each client.

As for regenerating on interval change, I view each set of tiles as finished metadata, so if you were to change the interval it wouldn't retroactively change finished metadata, but would require a Refresh / Replace All Images.

thornbill commented 1 year ago

Jellyfin does not use Shaka.

There is an open PR to add it for certain formats though because iirc it has better support for hevc content. 😉

nicknsy commented 1 year ago

https://github.com/nicknsy/jellyfin/tree/trickplay So far I've done the generation and storage for trickplay, but there's still quite a bit more to do like localization, proper config options, and the trickplay image endpoint/controller which I've decided to do instead of using the built-in images endpoint and having to do hacky index storing in the database.

The trickplay response header now looks like this for clients that need to manually request images:

"Trickplay": 
{
    "f37b3cd3-3269-dff0-34ee-51d4cda2c0e4": {
        "320": {
            "Width": 320,
            "Height": 180,
            "TileWidth": 10,
            "TileHeight": 10,
            "TileCount": 1,
            "Interval": 10000,
            "Bandwidth": 4003
        }
    },
    "244e7ede-3885-a724-9de6-308fd4d67816": {
        "320": {
            "Width": 320,
            "Height": 180,
            "TileWidth": 10,
            "TileHeight": 10,
            "TileCount": 1,
            "Interval": 10000,
            "Bandwidth": 4003
        }
    }
}
nicknsy commented 1 year ago

@iwalton3 Just about finished implementing this and noticed a strange issue with the trickplay images. On many of the images, there seem to be 2 or so pixels of an adjacent image or images that get shown along with it.

image

Here you can see it shows a bit of both the top and bottom images. I wondered if maybe the jpeg quality could affect the borders when the entire grid of tiles gets encoded so I set it to 100 with a qscale of 1 which did not resolve it.

image image

Looking at a grid of tile images it appears not all corners/edges align perfectly, though I'm just eyeballing it when I say that and could be wrong. I'm not sure exactly how skiasharp or image encoding could affect this, or if it's just an html styling issue, but do you have any ideas?

Thanks.

iwalton3 commented 1 year ago

I didn't notice this with the MPV Shim implementation. Maybe send me the file that is having issues so I can investigate it?

nicknsy commented 1 year ago

I didn't notice this with the MPV Shim implementation. Maybe send me the file that is having issues so I can investigate it?

I've sent an email to the address listed on your github.

iwalton3 commented 1 year ago

It looks like using yuv444 and 100 percent quality might curtail it a bit, but it is only really 1px of bleed through on the edges. image

diff --git a/Nick.Plugin.Jellyscrub/Drawing/VideoProcessor.cs b/Nick.Plugin.Jellyscrub/Drawing/VideoProcessor.cs
index a140db9..f70a091 100644
--- a/Nick.Plugin.Jellyscrub/Drawing/VideoProcessor.cs
+++ b/Nick.Plugin.Jellyscrub/Drawing/VideoProcessor.cs
@@ -339,7 +339,12 @@ public async Task CreateTiles(string directoryPath, List<FileSystemMetadata> ima
             var tileSetPath = Path.Combine(directoryPath, $"{imgNo}.jpg");
             using (var stream = File.OpenWrite(tileSetPath))
             {
-                tileSet.Encode(stream, SKEncodedImageFormat.Jpeg, _config.JpegQuality);
+                var jpegOptions = new SKJpegEncoderOptions {
+                    Quality = _config.JpegQuality,
+                    Downsample = SKJpegEncoderDownsample.Downsample444
+                };
+                var tileSetPixmap = new SKPixmap(tileSet.Info, tileSet.GetPixels());
+                tileSetPixmap.Encode(stream, jpegOptions);
             }

             var tileSetDuration = Math.Ceiling((decimal)config.Interval*tileCount / 1000);
nicknsy commented 1 year ago

Yeah, it's not a huge deal and could be fixed by keeping the wrapper div as a bit smaller than the source, but ideally using the width and height of the source images should map perfectly to a single tile. Looking at how youtube does it, many videos have this black padding around the top and bottom, even on seemingly perfect 16:9 videos:

image

Despite the jpg ending on their links, they are actually webp images. However they seem to face the exact same issues with edges not lining up after encoding, and sometimes the black top/bottom padding ends up in the preview, which isn't much better.

image

Plus, any padding would need to be applied to the image offsets so it wouldn't make implementation any simpler. So I guess I'll keep looking into it but it's likely the wrapper just has to be rendered a few pixels smaller.