meeb / tubesync

Syncs YouTube channels and playlists to a locally hosted media server
GNU Affero General Public License v3.0
1.88k stars 118 forks source link

Index Only Latest Content #536

Closed Piras314 closed 4 weeks ago

Piras314 commented 4 weeks ago

Introduction

Hello, I have been trying to set up tubesync to keep a rolling database of the latest 7 days of news from a news network's youtube channel, however, the channel has I believe around 16 thousand videos which is just completely unfeasible to index in its entirety. After 24h, my server apparently only indexed about 2.5k videos (which I could only view after restarting tubesync because otherwise the UI would constantly return "500 - Internal Server Error").

Proposal

It would be very nice to be able to index only the latest content to avoid this issue, especially for news channels which often have tens of thousands of videos and where you almost always only want the most recent ones.

Technical Details

Looking through my logs it seems like everything is indexed from newest to oldest, this could be (probably) relatively easily implemented by stopping indexing when the release date of the video passes a set date (e.g. 7 days ago) or by keeping a count of videos indexed and stopping at a certain number.

My Temporary "Solution"

Luckily there's a playlist on youtube that keeps the last 100 or so videos from said news network that I can probably manage to index, but that doesn't exist for every channel... and deleting the source gave a "504 Gateway Time-out" error on the web viewer, followed by "[CRITICAL] WORKER TIMEOUT" and "[ERROR] Worker (pid:353) was sent SIGKILL! Perhaps out of memory?" when I could clearly see the system never even got close to running out of memory by monitoring htop, and I tried multiple times while watching htop- memory usage never even went to 700mb total. To make things worse, despite saying it deleted data in the logs, the number of media items according to the UI didn't go down, so I've deleted my database to try with the playlist now.

Piras314 commented 4 weeks ago

I think this was my fault - I had the database on an external usb sd card reader as that's where I'm storing my videos, everything seems to be working extremely fast now.

meeb commented 4 weeks ago

Heh, yes using extremely slow storage would cause a significant performance bottleneck. Typically, media items are returned newest to oldest however it's not guaranteed. The only way to filter out what media should be downloaded is to index each items metadata. This is generally fine, however as you've noticed for absolutely massive channels this can create an initial backlog of tasks that does take quite some time to process.