meeb / tubesync

Syncs YouTube channels and playlists to a locally hosted media server
GNU Affero General Public License v3.0
1.9k stars 121 forks source link

Thumbnails downloading for videos older than download cap #482

Closed Nairou closed 2 months ago

Nairou commented 7 months ago

I have YouTube sources added which contain thousands of videos. However, I only care about new videos, as I'm transitioning away from direct YouTube viewing, so I have the Download Cap set to 1 week for each source. I only care about videos from the last week, and new ones as they arrive.

However, TubeSync is still downloading thumbnails for EVERY video in each source. Looking at the queued tasks, there are thousands of tasks waiting to download very old thumbnails. It can take several hours for it to work through those and get to the actual videos. This also causes the database file to get very large.

Is this intended behavior, or can it be changed to entirely ignore downloads of content older than the download cap?

meeb commented 7 months ago

The thumbnail issue is half by design and half I've not written a solution for it yet. The database itself will get large though as regardless of the thumbnails as you still need to get metadata for each item, even old items.

Nairou commented 7 months ago

Understandable about the database size. Is there a reason thumbnail downloads can't follow the same rules as video downloads?

meeb commented 7 months ago

Not especially, other than it's quite annoying to account for all eventualities. For example you modify the "keep" date on a source and there's currently no way to detect existing thumbnails, so you'd need to scan the disk to check for missing thumbnails if you extended the cap from 1 to 2 weeks and queue tasks for the missing ones. And the same in reverse if you shorten the cap.

It's perfectly possible, it just wasn't that big of an issue for most people to wait a bit when you add a big channel. I'll get around to it at some point.

The other option is I just hack in a basic "if" and then handle people creating issues for why some thumbnails are missing.

Nairou commented 7 months ago

For example you modify the "keep" date on a source and there's currently no way to detect existing thumbnails, so you'd need to scan the disk to check for missing thumbnails if you extended the cap from 1 to 2 weeks and queue tasks for the missing ones. And the same in reverse if you shorten the cap.

Are these steps already being done for videos? i.e. modifying the keep date, and either deleting or re-downloading videos to match. If thumbnails were deleted when their videos get deleted, and the thumbnail download task was always queued at the same time a video download task was queued, I would think no extra checks would be needed.

But... I can also appreciate that things might currently work differently, and this isn't a priority. :smile: Thanks for the explanations!

meeb commented 7 months ago

Yes, a record is kept of downloaded media and that is recalculated if you change a cap. Metadata like subtitles, NFO files and thumbnails are not currently explicitly tracked. They probably should have been from the start, though. They would be redownloaded if a media item was considered missing and redownloaded.

chminsc commented 3 months ago

same question here. Even if we have to download all the thumbnails, I think we can download the ones before cap date, and then start to download videos. After finishing this, we can download other thumbnails.

meeb commented 3 months ago

It doesn't technically need to download all the thumbnails, but you do need to download all the metadata. It's trivial to add a "don't download thumbnails older than the date cap" option, but not so trivial to re-detect missing thumbnails if other states or caps change.

timwhite commented 2 months ago

@meeb What currently detects that we need to download a previously skipped video if the source changes? As far as I can see, media_post_save is what triggers the download (and detects if the skip has changed). It also triggers the thumbnail download. I assume media_post_save is triggered when we save a Media item, how is it triggered with the source changes. I assume the index schedule task has something to do with it? In that case, thumbnails can easily be excluded for skipped items, and included when the item is no longer skipped. I might test it in a branch.

Edit: Testing the master branch atm, we seem to skip thumbnails if the main video is being skipped, but only if we get it marked to be skipped early enough. I'll play with the system and see what I can do.

meeb commented 2 months ago

The signals and "working out what to download and when" logic is pretty messy. It was originally just some hacky code I personally used which has been iterated on with an attempt to not break its behaviour but adding features over the years.

Currently when a source is saved it just calls save() on every media item linked to a source, which in turn triggers that media items save signal which recalculates things like can_download. There isn't really a cleaner way to do this, if a source is adjusted to have a different download cap for example, or change in source requirements, the only way to evaluate what media items can or cannot be downloaded is to check each one individually as this will likely involve examining each media items metadata.

This is obviously slow, however generally a rare-ish operation.

The trigger is here:

https://github.com/meeb/tubesync/blob/main/tubesync/sync/signals.py#L69

meeb commented 2 months ago

And yes, adding a download filter for thumbnails which don't have a can_download flag should be relatively straight forward.

timwhite commented 2 months ago

And yes, adding a download filter for thumbnails which don't have a can_download flag should be relatively straight forward.

I'll add this into my branch, makes testing quicker too as it's not scheduling lots of thumbnail downloads for media it'll just skip.