meeb / tubesync

Syncs YouTube channels and playlists to a locally hosted media server
GNU Affero General Public License v3.0
1.99k stars 130 forks source link

reset-tasks is killed when run in command line window #212

Open csjoh opened 2 years ago

csjoh commented 2 years ago

I'm having an issue with the sync tasks - they fail and are retried and fail again, and from there on nothing happens, and the failed tasks just stay failed. It appears that they are retried, but the logs don't show anything to that effect.

When running ./manage.py reset-tasks manually in the CLI, the task is killed after a while and after using all available memory (4GB, in my case):

root@bf1184d3a90c:/app# python3 ./manage.py reset-tasks
2022-02-02 13:37:26,957 [tubesync/INFO] Using database connection: django.db.backends.mysql://tubesync:[hidden]@192.168.2.8:3306/tubesync
2022-02-02 13:37:27,242 [tubesync/INFO] Resettings all tasks...
2022-02-02 13:37:27,272 [tubesync/INFO] Resetting tasks for source: Uncast
2022-02-02 13:37:27,880 [tubesync/INFO] Resetting tasks for source: Kurzgesagt - In a Nutshell
Killed
root@bf1184d3a90c:/app# 

As mentioned, memory usage goes through the roof and hits the hard limit I've set for the container, CPU usage rises to around 25% and both drop back to normal levels once the reset-tasks command is killed.

TubeSync version 0.10.0 with yt-dlp version 2022.01.21 and FFmpeg version 4.3.3-0+deb11u1 running on unRAID 6.9.2. 2x8C/16T CPUs, 144GB RAM total, container is limited to 4GB to keep it in check. MariaDB backend.

I'm thinking my hard memory limit is the problem, but I've had runaway memory issues with this container before and am reluctant to let it loose again.

meeb commented 2 years ago

Can you try it with an 8gb limit and see if reset-tasks completes then? If it does we've at least narrowed memory usage down to being the problem.

csjoh commented 2 years ago

Can you try it with an 8gb limit and see if reset-tasks completes then? If it does we've at least narrowed memory usage down to being the problem.

Increasing the limit to 8GB made it run and complete in a proper manner. I'll leave it at 8GB to see if it solves the issue of the sync tasks inexplicably stopping and report back in a few days.

meeb commented 2 years ago

OK, thanks for testing. I'll add "reset-tasks can use an ungodly amount of RAM for no obvious reason" to the list of things to kick in the tasks system.

MatthK commented 2 years ago

I am struggling with this as well at the moment. I'm running TubeSync in a docker container with a MySQL DB. The machine with the docker container has 16GB of RAM and runs a few other containers and 2 Virtual Machines. I tried to add the Youtube channel TED, but as there are so many videos, it completely blows my system.

I added the channel and it started to scan the channel. It added some +4,000 tasks. I don't remember exactly why or when, but I eventually ran the python3 ./manage.py reset-tasks inside the container. That kinda locked up my whole server repeatedly (I had to hard reset it a few times). I assume it was the DB that completely struggled with the task. So I moved the database container to another machine that is running idle.

While the reset-tasks job then somewhat ran faster, it still ended after a long time with the message "Killed". And the TubeSync container ended up using some 11GB or RAM and sort of pushed my system again to a halt before I stopped the container.

I'm not sure what the solution could be, but for channels with moderate content, it works quite fine. But for high volume channels, the current task solution is really making the system struggle. One idea I'm having is, to simply be able to set a "start date", and TubeSync simply ignores any video with a date before that?

meeb commented 2 years ago

The current tasks system indeed has a bunch of problems. There is a long term plan to replace it entirely, it's just complex to do without breaking all installs now. The tasks should run one at at time and independently of each other so in theory their total number shouldn't affect anything. In reality though this doesn't seem to always be the case. The current task system I largely chose at random after playing with it a bit and it seemed fine, if I had known how popular this would get and how large the channels people would try and sync with it I wouldn't have used it. Hindsight and all that.

I would add a feature to only index media within the specified source time frame, but the upload date information is only available in the metadata when directly crawling a media item individually, hence the one task per media item to request its metadata before knowing if it can be skipped or not. I've experimented with a couple of different attempts to work around this such as injecting search parameters into the URLs or using RSS feeds, but none of them have been really suitable as a drop-in replacement for the current method.

I will look into why reset-tasks on the command line is using a load of RAM. It really shouldn't it's just a loop calling save() on every media item, which triggers a signal to reschedule any missing tasks. I suppose Django isn't de-referencing something so it's storing every media item in RAM or similar issue rather than being GC'd.

MatthK commented 2 years ago

One thing I noted is that you still save the whole Metadata of some 270KB - 380KB for skipped items. I only checked a few samples. That makes the database quite big. During the reset-tasks job it queries the database for all media of a source, so that result-set can become pretty big.

Would it be possible to discard all (or most) of the metadata content once it is known to be skipped?

I checked my table. I have 7,338 skipped media with a total value of 23,722MB in the metadata field (SUM(LENGTH(metadata)), while I only have 1,876 downloaded media with 2,257MB of data in the metadata field. So my database could be 10% the size if that data could be discarded.

meeb commented 2 years ago

Yes that sounds quite possible. Just to confirm, that's 23 point 7 megabytes right?

MatthK commented 2 years ago

No, these are thousand MBs. The database is almost 26GB in total. Quite big indeed.

meeb commented 2 years ago

That's pretty massive! 3.2mb of metadata per video?! I'll look into deleting metadata once a video is marked to be skipped. Just out of curiosity could you give me a YouTube link to a random skipped video in your database? Mostly just to see the massive metadata.

MatthK commented 2 years ago

Here is one. The metadata has a length of 383,060.

https://www.youtube.com/watch?v=oBLQmE-nG60

Some others are around 350K, like this one: https://www.youtube.com/watch?v=ybjl5nXSR-U

meeb commented 2 years ago

Excellent, thanks for the details.

MatthK commented 2 years ago

I just checked it again and it somehow seems that the TED channel breaks my TubeSync. I did move the whole container now to that new, more powerful server. Although I ran the reset-tasks again, it seemed to finish. However, on the dashboard I was left with some +300 tasks and since yesterday, they didn't change at all.

So I just tried to run it again (command line in the docker shell) and it seemed again to finish, but the 381 tasks are not budging. So I checked DMESG on the server and the last few entries are the following:

[180719.989709] [ 346116]     0 346116     1775      110    57344        0             0 bash
[180719.989711] [ 346261]     0 346261  6642120  6623805 53313536        0             0 python3
[180719.989713] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=871dd185e8211fba2de870124a00bfcd6267a24cbc7387bdbeedd0693bc2b740,mems_allowed=0,global_oom,task_memcg=/docker/4ad1777fb01f4ded8f57860ceccf156df35efbf57acceb345a5ccafe8ecf29cd,task=python3,pid=139663,uid=1000
[180719.989730] Out of memory: Killed process 139663 (python3) total-vm:34100136kB, anon-rss:31812744kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:64256kB oom_score_adj:0
[180720.561110] oom_reaper: reaped process 139663 (python3), now anon-rss:0kB, file-rss:0kB, shmem-rss:8kB

The python3 process got killed, and possibly that's why the tasks no longer work.

meeb commented 2 years ago

Yep, OOM killing it would indeed stop it from working. There's no obvious reason why it's keeping so much data loaded in RAM but it's clearly happening regardless due to the evidence here.

MatthK commented 2 years ago

I just tried multiple times to delete the channel, but that one also fails.

Can I just delete those records from the sync_media table directly in the database and then try to delete the source in the web-interface again? Would that create too many troubles in the DB?

meeb commented 2 years ago

Yeah you can just delete the entries in the sync_media table and then the source itself in sync_source. The script version is only slow because it cancels any outstanding tasks for each media item, cleans up files on disk and does other housekeeping. You might have to do some cleanup manually afterwards, but you can certainly just delete the database rows.