meeb / tubesync

Syncs YouTube channels and playlists to a locally hosted media server
GNU Affero General Public License v3.0
1.88k stars 118 forks source link

[Enhancement] More control over fallback selection #88

Open micahmo opened 3 years ago

micahmo commented 3 years ago

Hi, first of all I am loving tubesync. I've had in mind for a while that I wanted to automatically download playlists for viewing in Plex, and I was so glad to discover that someone has already done the hard work to develop it. :-) (BTW, I just stumbled upon this project when I was browsing the New Apps section of the Community Apps in Unraid.)

It would be nice if there were some additional options related to the selection of the fallback quality and/or codec.

Here is my use case. I set up a playlist source where the source resolution is set to 1080p (Full HD), the source video codec is set to AVC1 (H.264) or VP9 (I experimented with both), and the fallback is set to Get next best resolution but at least HD.

I am downloading a playlist where almost all of the videos are 1080p with both AVC1 and VP9 codec options available, which is great. However, there's a single video that's only 720p. Since it doesn't have a 1080p option, the fallback logic kicks in. I'm not sure what the logic is, but my guess is that it does something like bestvideo+bestaudio in youtube-dl. Unfortunately, the bestvideo for this video uses AV1 codec, which Plex does not yet support. There are plenty of other codecs available at 720p which would work perfectly fine, but the fallback options aren't detailed enough for me to pick, so I'm stuck with the AV1 unless I download the video manually.

If you're curious, here is the video ID: GSOo84fAdsE. Running youtube-dl -F GSOo84fAdsE | grep 720 gives these formats. I would be happy with any of them except 398, which unfortunately happens to be the best in terms of bitrate. :-)

136          mp4        1280x712   720p  447k , mp4_dash container, avc1.4d401f@ 447k, 30fps, video only, 112.42MiB
247          webm       1280x712   720p  551k , webm_dash container, vp9@ 551k, 30fps, video only, 138.35MiB
398          mp4        1280x712   720p  647k , mp4_dash container, av01.0.05M.08@ 647k, 30fps, video only, 162.58MiB
22           mp4        1280x712   720p  576k , avc1.64001F, 30fps, mp4a.40.2 (44100Hz) (best)

I'll leave it up to you to decide if/how you want to handle this case. Here are a couple of ideas I had.

You could add the ability to blacklist a particular format. For me, I know that I will never want AV1 since it's not supported by Plex.

Or maybe you could try to use the preferred source video codec when performing fallback. In this case, if had picked ACV1 (H.264) as my preferred source video codec, the fallback could find format 136, which would be perfect.

Anyway, thanks again for the amazing tool!

meeb commented 3 years ago

Thanks for the detailed input. I can see the use case. A "codec blacklist" approach, if this were to be implemented, would probably be the cleanest to go with the internals of TubeSync. TubeSync does something similar to youtube-dl's bestvideo+bestaudio but because I wanted much more control over the resolutions and codecs this had to be completely reimplemented so modifying this behaviour isn't as much work as it might seem. There are potentially fancier options like using ffmpeg to transcode into other codecs automatically, but generally an initial goal for TubeSync was to be lossless (from the source) and re-encoding would change that. I'll put a codec blacklist onto the 1.1 roadmap for now, although if Plex adds support for ACV1 before I get 1.1 finished I'll bump it. Cheers.

micahmo commented 3 years ago

Thanks for the insanely fast response! :-) Codec blacklist in 1.1 sounds awesome, and I'm glad it fits with your codec/resolution picking logic.

I do think ffmpeg transcoding (if the preferred format is not found) would be a cool long-term goal. I can totally see a use case where someone wants to guarantee that every video is in the same format, regardless of what YT serves.

BTW, I'm not holding my breath for Plex to add AV1 soon, given their track record with implementing community feature requests and that its a relatively new codec. That said, it seems like a good open standard that will likely be picking up steam before long.

Cheers!

micahmo commented 3 years ago

Hi @meeb,

I've been thinking about this some more. While a codec blacklist would be nice, and would solve the problem of a media player that can't handle a particular codec, I think it may also be nice to support some kind of codec matching. I'm thinking not just in terms of my use case, but for anyone. Say I set up my source to download 4k or 8k. Chances are that few videos will match that criteria, so then it's a mystery (at least before looking at the code) how tubesync will pick the fallback.

I decided to dive into get_best_video_format, and I see you already handle a lot of cases. The order of precedence (at least under prefer_60fps and not prefer_hdr) seems to be...

  1. resolution + codec + fps + hdr
  2. resolution + fps + hdr
  3. codec + fps + hdr
  4. codec + fps
  5. resolution + codec + hdr
  6. resolution + codec
  7. resolution
  8. next best resolution

In the case of the video that prompted me to enter this bug, it does not have a match for preferred resolution (1080p) nor FPS (60) (it maxes out at 720p30), which causes it to fall through all way to the last case, next best resolution, which can be any codec. My proposal would be to add a case that only matches codec, like this.

if not best_match:
    for fmt in video_formats:
        if source_vcodec == fmt['vcodec']:
            exact_match, best_match = False, fmt
            break

I understand this case could eliminate a higher-resolution match in favor of codec, but you already have non-resolution-based matches (like codec + fps) above some resolution-based matches (like resolution + codec). I don't think it would be strange to add a codec-only match, maybe around number 5 in the list above (or at least before the last).

Of course, it's totally up to you if/how you want to proceed with this. A codec blacklist would still solve my case, but it might be worth considering whether a codec-only match should be added.

Thanks again for the awesome tool, and I'm excited to see it grow!

meeb commented 3 years ago

Hi @micahmo, thanks for all your contributions lately. You are pretty much entirely correct, I purposefully chose resolution matching over codec matching as that's what I deemed to be more important after testing a bunch of videos.

As you've seen poking about in my overly verbose format matching code this is generally "the" main difference between TubeSync and just using bestvideo+bestaudio so it automatically, repeatedly and reliably get the quality you want (in theory), plus a couple of things like properly handling 4k and 8k.

If there were a codec only match at position 5 you may select 1080p, have a channel with 1080p videos but only in VP9 or something, and just download 720p/AVC1 which could be quite confusing if you're not too up to speed on codecs and the like. I get why you would what what this, however as a general policy I've been erring away from "accidentally shooting yourself in the foot" options which this could turn into. I think this is a valid feature, but I probably might still want to prioritise resolution so this would be some sort of override extra feature if not a blacklist.

Maybe even a "what's more important to you? CODEC or resolution?" master option that presets to resolution might be an good way to try and implement this.

Also just on another note I've had some unexpected stuff crop up over the last couple of days so I'm very happy to get community contributions and input but I might be a bit slow to respond, review tasks or do any open source coding for a week or so.

micahmo commented 3 years ago

If there were a codec only match at position 5 you may select 1080p, have a channel with 1080p videos but only in VP9 or something, and just download 720p/AVC1 which could be quite confusing if you're not too up to speed on codecs and the like.

Totally agreed and acknowledged! However, if I'm understanding the code correctly, something similar could already happen. Say a source is configured for 1080p VP9 60fps, and say there are video matches for 720p VP9 60fps and 1080p VP9 30fps. You'll prioritize the first (number 4 in the list above) over the second (number 5 or 6 in the list above), even though the second is higher resolution. My point is mainly that the source options don't have much control over (or clearly indicate the behavior of) the fallback selection. Then again, I totally understand that you're going for simplicity and easy defaults that prevent shooting oneself in the foot.

Maybe even a "what's more important to you? CODEC or resolution?" master option that presets to resolution might be an good way to try and implement this.

Yes, I think this is something worth considering. I wouldn't want it to get too complicated, but perhaps something where you can choose the order of importance for various options, and also pick which ones are required for a match. So instead of the fallback options where...

...you could specify that 1080p is required, but the codec is not, or vice versa. It would give the user much more control about how the final format is selected, and it would greatly simplify the fallback logic in the code, instead of being (somewhat) arbitrary.

Again, you're the expert, so I'll leave the design and UX up to you. Just hoping to contribute some positive ideas! :-)

Also just on another note I've had some unexpected stuff crop up over the last couple of days so I'm very happy to get community contributions and input but I might be a bit slow to respond, review tasks or do any open source coding for a week or so.

No worries at all. Life comes first! Wishing you the best!

I might comment some more as I think about this, but please don't feel obligated to respond if you're otherwise occupied.

meeb commented 3 years ago

I think we generally are on the same page, I agree more control would be better if we can work out how to make the interface simple enough. I'll leave this on the 1.1 milestone. Thanks again.

micahmo commented 3 years ago

Hi again @meeb. I hope things are going well for you. Sorry to keep posting on this thread as I know we mostly agreed on the solution, and it's not targeted till 1.1 anyway. But I had some more thoughts I wanted to share.

I was thinking that perhaps a single additional option in the fallback settings that prioritizes codec might be all that's needed. (Also potentially renaming and reworking the logic of the existing options might be benefitial.)

Here's an idea for the list of fallback options and how they could work:

I worked up a sample of how I thought this could work with greatly reduced complexity (less branching, less iterating) compared to the current version. If you're interested, you can take a look at it here.

https://gist.github.com/micahmo/0ee2c4e83a3fdb1c98cabdcd03544fa7

It seemed to look good until I ran the tests. :-) That's when I discovered that the way that FPS and HDR preferences are applied differs depending upon the combination of the flags. For example, if the user prefers 60 but not HDR, the exact match can only be achieved if the video is 60fps and not HDR (here). Whereas if the user prefers HDR but not 60, an exact match must only be HDR, regardless of FPS (here).

I was originally thinking that if the 60fps or HDR options are not selected, then a video of any framerate or HDR-ness is considered an exact match (like the user is saying they don't care). Or perhaps the opposite, if those options aren't selected, an exact match must not have either one. But it seems like the current matching is a mix. My solution currently does the former and could be reworked to do the latter, based on what's added into the fps_matches and codec_matches buckets. But to do a mix (and satisfy the tests) would require more branching like the original version, which I didn't want to do. :-)

Anyway, I've spent enough time diving around in the code for now, so I'll leave you alone! As usual, please don't feel obligated to respond in a hurry, use any of my suggestions, or use any of the code snippets. I'm just glad to have a better understanding of how things work and to contribute to the conversation. :-)

P.S. I believe I found a small bug in the tests. Some inputs are labeled 4320P with a capital P, such as here. In the get_best_video_format method, it tries to get the height (here) which tries to index into a dictionary of heights here. But it doesn't match the constant here, which has the lowercase p, so a height of 0 is returned.

meeb commented 3 years ago

Yeah the handling for HDR and high-FPS media is difficult. Selecting HDR content for media when the user has not selected HDR as a preference could result in weird behavior, particularly with playback in Plex or other media players, with significant issues with colors and other common issues when HDR media is played on non-HDR capable displays.

A PVR should have proper controls for the selection of HDR and high FPS media if possible. This can get pretty complicated if more choices are added in because as you've noticed each new preference added doubles the number of format combinations the format selection has to handle. To fully support this is why I wrote the somewhat over the top test suite and media matching in the first place and not relying on bestvideo etc.

The format matching code currently has a bunch of branching and isn't massively efficient on purpose in an attempt to keep it more understandable, performance isn't too important as it's not called too frequently. I'm not opposed to making it nicer of course, just mentioning why it currently is like it is.

You're more than welcome to submit PRs for media format matching adjustments or the lower-case P bugs etc. I did smash out the media format matching stuff on an evening and tweaked it to match the outcome I wanted with the test suite and a load of test videos so it is likely it needs or will get tweaked over time anyway.

If I understand your gist correctly, it doesn't handle combined formats too well, which do occur on YouTube (e.g. AVC1 with video and audio streams)? Rest looks probably fine but I haven't run tests on it etc to confirm.

micahmo commented 3 years ago

Selecting HDR content for media when the user has not selected HDR as a preference could result in weird behavior, particularly with playback in Plex or other media players, with significant issues with colors and other common issues when HDR media is played on non-HDR capable displays.

Ahh that makes total sense, thanks for explaining! Now if we're thinking way down the line (like v1.2), it would be cool if the FPS and HDR options had controls like "must have", "must not have", "any". Then the expectations would be very clear. But I completely understand why it works the way it does now, and it doesn't really need any enhancement (at least for now). :-)

The format matching code currently has a bunch of branching and isn't massively efficient on purpose in an attempt to keep it more understandable, performance isn't too important as it's not called too frequently. I'm not opposed to making it nicer of course, just mentioning why it currently is like it is.

Understood, and (with the possible exception of a codec fallback) it works great! No intention to criticize. It just took a little while to understand. :-)

You're more than welcome to submit PRs for media format matching adjustments or the lower-case P bugs etc.

Cool, I might submit a PR for the test bug. It's not a big deal. I guess the tests pass now because the match is still found without the height. Only when changing the matching logic (and causing the tests to fail) did I discover the discrepancy. But it wouldn't hurt to have it fixed. I'll probably leave the matching stuff alone for now. You're welcome to take ideas from the gist, but you know the matching code way better and I don't necessarily think mine is any better haha.

If I understand your gist correctly, it doesn't handle combined formats too well, which do occur on YouTube (e.g. AVC1 with video and audio streams)?

Hmm, I tried to base it on get_best_video_format, which seems to only look at the video codecs. get_best_combined_format would still exist as-is, looking at both.

Rest looks probably fine but I haven't run tests on it etc to confirm.

As mentioned, the tests will fail, and while I spent some time trying to make them pass, I realized it might require a different approach than what I was doing with sets and unions, which is why I say mine is probably not the best.

meeb commented 3 years ago

Hmm, I tried to base it on get_best_video_format, which seems to only look at the video codecs. get_best_combined_format would still exist as-is, looking at both.

You are correct, I'm basing my comments from memory writing on my phone rather than checking code, ignore me here :)

ExodusC commented 3 years ago

Is the intent of the "Get next best resolution or codec instead" fallback option intended to have TubeSync grab -some- copy of the video in question, disregarding both codec and resolution options? If that is the case, I've found several instances where TubeSync will indicate that there are no matches to the current codec selections, and fail to download any video. In many cases, it's with significantly older content that only exists in 240p, but this doesn't appear to be exclusive to 240p content.

From a content archival perspective, having a fallback option that will attempt to grab -some- copy of the media, rather than failing due to codec type restrictions. It seems like many people use TubeSync for media player integration so I completely understand restricting by codec types, but it's proven to be great for content archival, so I think an option like this would be a useful enhancement... Or maybe this is simply a bug and the "next best resolution or codec" option is supposed to be doing this already!

meeb commented 3 years ago

@ExodusC that's actually working as designed at the moment. There's a hard cut-off where 360p is the lowest resolution that will be downloaded at the moment, however it is an arbitrary setting:

https://github.com/meeb/tubesync/blob/main/tubesync/tubesync/settings.py#L152

The original use case of TubeSync was to be a PVR for YouTube and stuff media into Plex or similar server hence a "sanity" cut-off for reasonable sized content. To answer your exact question the "Get next best resolution or codec instead" option does attempt to get any video providing it's at least the size of that defined cut-off. It should only affect media < 360 pixels in height.

I can see a use case where this limit should be lowered to 240p. I didn't consider archiving very old content a popular use for TubeSync but it seems that's what quite a few people are using it for. I'll give the matching code a once-over and see if it's sane to lower it.

neilsonwong commented 2 years ago

Not to hijack this thread, but if the "Get Next best resolution or codec instead" should guarantee some form of file retrieval I'm seeing cases where no files are downloaded despite these being within the available formats.

ID: 94 , (854x480), fps:30.0, video:avc1.4d401f @1282.517k , audio:mp4a.40.2 @k / Hz ID: 95 , (1280x720), fps:30.0, video:avc1.4d401f @2447.97k , audio:mp4a.40.2 @k / Hz ID: 96 , (1920x1080), fps:30.0, video:avc1.4d4028 @4561.186k , audio:mp4a.40.2 @k / Hz

Using that fallback should it not always match at least one format (given a high enough res)? For reference, I'm on 0.10.0. I've noticed this happens mostly with certain youtube streams that become posted as videos later.

meeb commented 2 years ago

Yes the fallback should get something providing there's a video at least 360p in resolution. Can you share an example video where this is occurring and your full source settings and I'll have a look at why it's not matching.

neilsonwong commented 2 years ago

Here are my full source settings:

Type YouTube channel Name Miyadi ASMR Media items 196 Key MiyadiASMR Directory miyadiasmr Media format {yyyymmdd}{source}{title}{key}_{format}.{ext} Example filename 20210923_miyadi-asmr_some-media-title-name_SoMeUnIqUiD_720p-vp9-opus.mkv Index schedule Every hour Download media? YES Created 2020-12-18 22:16:22 Last crawl 2021-09-21 23:47:27 Source resolution 720p (HD) Source video codec VP9 Source audio codec OPUS Prefer 60FPS? NO Prefer HDR? NO Output extension mkv Fallback Get next best resolution or codec instead Copy thumbnails? NO Write NFO? NO Delete old media No, keep forever UUID e75da87c-b297-4586-917b-a659fddb42dc

Here is one of the culprit videos: https://www.youtube.com/watch?v=thGc1h4eFos

Here is the video matching: Screen Shot 2021-09-23 at 8 10 11 PM

meeb commented 2 years ago

Cheers, that could well be a bug after the switch to yt-dlp with some formats if the metadata has significantly changed structure. I'll look into it.