Huge memory use - Githubissues

perry-mitchell commented 1 year ago

Hello,

I've been debugging my Unraid instance lately, as it's been crashing since I added the TubeSync container on it. I've added some small channels that work fine (~150-200 videos), but adding a 5K+ channel like Markipilier or LTT causes huge amounts of memory to be chewed up.

Screenshot from 2023-10-24 17-11-31

I found out my system was probably running out of memory, due to it being unconstrained:

Screenshot from 2023-10-24 16-51-03

So I followed the directions in this issue, and locked the memory to 4G. But now when I try to index any channel, big or small, I get:

[2023-10-24 18:20:38 +0300] [319] [ERROR] Worker (pid:365) was sent SIGKILL! Perhaps out of memory?

Mind you, all channels have been scraped already.. so it shouldn't be doing as big a job again. I'm also using an external MariaDB instance.

Any ideas what I can test next?

EDIT: Extra context:

2023-10-24 17:17:18,662 [tubesync/INFO] Successfully downloaded media: kkEypAe3rpE (UUID: c0fb92b0-fb6e-49da-903b-673074e79c34) to: "/downloads/video/shiey/2023-09-30_shiey_mission-to-enter-and-escape-collapsing-mine-tunnels_kkEypAe3rpE_1080p-vp09-opus-60fps.mkv"
2023-10-24 17:17:18,702 [tubesync/INFO] Copying media thumbnail from: /config/media/thumbs/c0/c0fb92b0-fb6e-49da-903b-673074e79c34.jpg to: /downloads/video/shiey/2023-09-30_shiey_mission-to-enter-and-escape-collapsing-mine-tunnels_kkEypAe3rpE_1080p-vp09-opus-60fps.jpg
2023-10-24 17:17:18,703 [tubesync/INFO] Writing media NFO file to: to: /downloads/video/shiey/2023-09-30_shiey_mission-to-enter-and-escape-collapsing-mine-tunnels_kkEypAe3rpE_1080p-vp09-opus-60fps.nfo
2023/10/24 18:20:38 [error] 347#347: *156 upstream prematurely closed connection while reading response header from upstream, client: 192.168.0.50, server: _, request: "POST /tasks-reset HTTP/1.1", upstream: "http://127.0.0.1:8080/tasks-reset", host: "192.168.0.100:4848", referrer: "http://192.168.0.100:4848/tasks-reset"
[2023-10-24 18:20:38 +0300] [319] [ERROR] Worker (pid:365) was sent SIGKILL! Perhaps out of memory?
[2023-10-24 18:20:38 +0300] [838] [INFO] Booting worker with pid: 838
2023-10-24 18:20:39,336 [tubesync/INFO] Using database connection: django.db.backends.mysql://tubesync:[hidden]@MariaDB:3306/tubesync
2023-10-24 18:20:43,353 [tubesync/INFO] Using database connection: django.db.backends.mysql://tubesync:[hidden]@MariaDB:3306/tubesync

EDIT 2: I can see the kernel killing the processes at the unraid/OS level:

Oct 24 18:20:41 Nostromo kernel: oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/docker/c6047e9c67bcbc030ea0cd7f4ef029403aed74d4b481f58b3cdb480a04b47464,task_memcg=/docker/c6047e9c67bcbc030ea0cd7f4ef029403aed74d4b481f58b3cdb480a04b47464,task=python3,pid=15412,uid=99
Oct 24 18:20:41 Nostromo kernel: Memory cgroup out of memory: Killed process 15412 (python3) total-vm:4294580kB, anon-rss:3814516kB, file-rss:0kB, shmem-rss:0kB, UID:99 pgtables:7956kB oom_score_adj:0

perry-mitchell commented 1 year ago

Seemed stuck again, so I reset the tasks again.. Memory quickly climbed to my new limit of 6G and it died again:

[2023-10-24 18:43:19 +0300] [319] [CRITICAL] WORKER TIMEOUT (pid:364)
2023/10/24 18:43:19 [error] 346#346: *24 upstream prematurely closed connection while reading response header from upstream, client: 192.168.0.50, server: _, request: "POST /tasks-reset HTTP/1.1", upstream: "http://127.0.0.1:8080/tasks-reset", host: "192.168.0.100:4848", referrer: "http://192.168.0.100:4848/tasks-reset"
[2023-10-24 18:43:19 +0300] [364] [INFO] Worker exiting (pid: 364)
192.168.0.50 - - [24/Oct/2023:18:43:19 +0300] "POST /tasks-reset HTTP/1.1" 502 150 "http://192.168.0.100:4848/tasks-reset" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/118.0"
[2023-10-24 18:43:20 +0300] [319] [ERROR] Worker (pid:364) exited with code 1
[2023-10-24 18:43:20 +0300] [319] [ERROR] Worker (pid:364) exited with code 1.
[2023-10-24 18:43:20 +0300] [432] [INFO] Booting worker with pid: 432
2023-10-24 18:43:20,206 [tubesync/INFO] Using database connection: django.db.backends.mysql://tubesync:[hidden]@MariaDB:3306/tubesync

After this it eventually restarted:

2023-10-24 18:45:00,048 [tubesync/DEBUG] [download] Finished downloading playlist: Markiplier - Videos

After it finished, the memory was almost bang on 100%:

Additionally, my other 2 channels don't show tasks in the schedule, only "markipiler". Even though all are scheduled to index at some point:

d0ugal commented 1 year ago

How many workers do you have?

perry-mitchell commented 1 year ago

How many workers do you have?

Sorry I didn't see this. I've since removed my copy of Tubesync while I plan on migrating it to a new system. So I don't recall exactly what I had configured there. I suspect it was 2 workers, but could have been 4 :man_shrugging:.

Imo there shouldn't be any need to use so much memory - Playlists and channels can be huge so if pulling this data is what's causing the huge use of memory, perhaps it should be written to disk instead of expecting the host to have enough memory to handle any channel size. Databses like MySQL do this - not sure what storage mechanism Tubesync uses under the hood but if it were a DB like this it should rollover onto disk just fine.

meeb commented 1 year ago

There isn't any need for that much memory and it's absolutely a bug. The data collected by the worker can be huge in some cases and that's stored in RAM, however it's not being effectively garbage collected which is related to how the worker library is or isn't working properly. As for temporary spooling of large data to disk, that would be a yt-dlp feature and not a tubesync feature.

perry-mitchell commented 1 year ago

Thanks for clarifying! To be clear, is the garbage collection bug on tubesync's side or yt-dlp's?

I'll be setting up a new node probably in the next week or so.. If there's more debugging information I can provide, in case this happens again, please let me know and I'll do my best to share it here.

I'll be running it in an LXC next though, instead of just Docker on Unraid, so the environment will be subtly different.

meeb commented 1 year ago

It's likely in the (now generally unsupported and legacy) background worker library used in current tubesync builds, so neither directly tubesync or yt-dlp. yt-dlp does allocate a lot of memory when asked to index a massive channel, but that's likely not avoidable given the sheer size of the metadata that needs to be processed. The worker library in tubesync has been on the cards to be replaced for... some time now, it's just a lot of work and I've not had time.

perry-mitchell commented 1 year ago

The worker library in tubesync has been on the cards to be replaced for... some time now, it's just a lot of work and I've not had time.

Understandable. Anyway, I really appreciate the project so I'll keep on eye on this going forward :)

locke4 commented 1 year ago

I can reproduce this issue also. I'm working on a branch this weekend which I'm hoping will be a short-term fix. Only saw the issue when I was running some VMs, which reduced the memory available on my server. I'm hoping setting max_requests_jitter and max_requests on the gunicorn workers might improve this and force each worker to give up it's memory allocations when a large job is being processed by them.

meeb commented 1 year ago

@locke4 this isn't going to be anything to do with gunicorn, it's the background workers which are far more complex to replace.

bawitdaba commented 1 year ago

I've also been having this issue, though I do use TubeSync a little differently then most. I have a single Plex Watch Later playlist that I manually sync in TubeSync when I add new items, and I also use TubeSync with MariaDB and I have 64 GB of RAM. Eventually it gets to the YouTube playlist limit of 4,000 videos and becomes unmanageable in TubeSync.

When I max out or near max out the YouTube playlist TubeSync will use an ungodly amount of system RAM easily over 16 GB of RAM and eventually all of it until the system becomes unresponsive.

I tried creating a new empty playlist and pointing TubeSync at that, and while that did allow the indexing to go quicker Docker would still kill the python3 process for using over 8 GB of RAM the new limit I imposed on the container.

Even though the playlist is empty, all the old playlist items are still stored in sync_media. Which this was expected, but what I noticed was that django is still loading all of these items and taking a ton of ram to do so. I also saw this when loading the /admin panel as well.

I emptied the tables sync_media and background_task_completedtask and TubeSync becomes usable again taking only a few hundred MB of ram.

locke4 commented 1 year ago

@locke4 this isn't going to be anything to do with gunicorn, it's the background workers which are far more complex to replace.

Fair enough, you're talking about the Django Background Tasks as the background workers? I'm running a test on a channel which I know breaks tubesync:latest but assuming it won't have any improvements if it's not relating to the gunicorn or celery workers. I have a fix relating to the filter_text behaviour for this PR anyway so good to try and get that completed this weekend regardless.

I want to see if I can look into a fix short of replacing them as I appreciate replacing is a much larger task. For extra info, RAM allocated during indexing also doesn't seem to be dropped which exacerbates this (below is an indexing task which completed around 8am but memory never released, deleting the channel immediately crashes the docker container with SIGKILL out of memory errors). 11am is me updating to my branch and repeating the test.

locke4 commented 1 year ago

As expected, doesn't fix the excessive memory usage during a delete operation. I was however able to complete the delete task (though the web browser timed out, the container didn't crash out with a SIGKILL error) which I don't really understand. I found https://github.com/arteria/django-background-tasks/issues/186 which might be promising but forking django-background-tasks isn't the best solution long-term.

perry-mitchell commented 11 months ago

Migrated my setup from Unraid to Proxmox, now starting off with a smaller 300 video channel. Seems ok so far:

Once this is done I'll try again with another 5K video channel, see how it likes that.

EDIT: Going for the big one haha. Looks fine so far though. I'm only sync'ing the metadata so far:

EDIT 2: It's ballooned quite a bit, so I've put it up to 4GB overnight, as it scrapes:

EDIT 3: Still seems to be using a crazy amount of RAM, though it's just enqueuing thumbnail downloads, nothing else:

EDIT 4: Memory grows by a gigabyte when I refresh the tasks page, too..

perry-mitchell commented 11 months ago

Also, even once a source successfully syncs, I can't open it without it crashing:

Memory climbs to 2-3GB more, then this crash.

Before:

During:

EDIT: So I modified the settings directly in the DB to now enable media download, and clicked refresh in the sources list for "markiplier". Is it normal that it adds all thumbnails to the queue again? Even after they've already been fetched?

Nevermind, it just did it until the cut off time.

locke4 commented 11 months ago

@meeb, I want to take a look at this task thing again this weekend. Am I right in thinking your desired replacement is for tasks to be offloaded to a redis queue for celery workers to pick from? I want to make sure I go in the direction that you think is best so any pointers before I restart my feature branch would be really appreciated!

meeb commented 11 months ago

@locke4 Yes, that's the plan. However I should warn you that this is a significant amount of work. General rewrite and refactoring big. Porting the tasks to celery tasks is the easy bit. Migrating all existing tasks (and task management and UI updates on tasks) without breaking anyone's installs is going to be much harder.

perry-mitchell commented 11 months ago

Putting my hand up now to be a nightly build tester ✋😅

meeb commented 11 months ago

Just to mention here that even with rewriting the worker and tasks system this won't really reduce the peak memory requirement much, it'll just mean it gets freed again after use.

locke4 commented 11 months ago

I've got a couple of thoughts that'd I'd appreciate your thoughts on @meeb after my prototyping my feature branch on this issue yesterday.

I've created celery tasks for all web interactions so the user immediately gets a success response, resolving the gunicorn timeout which is related to very large channels. Background-tasks is still used but moved into each celery worker thread. Not only does this massively simplify the task, but it should make this a non-breaking change and all the existing UI still works.
I've split the most horrifying loop, the media.save loop. The loop is now performed in the celery worker thread and the media.save creates a celery task each time the loop runs. This had a huge performance increase as previously a single gunicorn thread was running the task, now it's multithreaded by the number of celery worker threads. Looked like between 5-10x speedup in tasks like updating a source or deleting a source in the backend.
I haven't seen any SIGKILL errors yet. It looks like the memory requirement is essentially throttled by the number or celery threads.

I've got a couple of weird interactions I need to solve before I commit this feature branch to my fork repo but generally I think this approach is a good candidate fix. I should have a PR ready in a week, provided you're happy with this approach

meeb commented 11 months ago

Thanks for the effort, @locke4! Specifically for your questions:

Ideally, background_tasks needs to be completely removed. This is blocking upgrading Django beyond 3.2 as well as nesting tasks is going to cause unexpected issues with scheduling and likely cause other more complex problems. This was a major reason it's taken so long. A dual celery/background_tasks implementation sounds pretty nightmarish. Offloading all work to the background is excellent though and the general end game!
Sounds reasonable, however this likely needs locks to prevent race conditions as well as other less common situations. Note anyone using an SQLite backend (which is most people) can only have a single concurrent write and the yt-dlp interacting worker should only run one process at once (any more than that and YouTube starts to get angry) so some concurrency controls are going to be needed.
The memory limit is actually just indexing channel metadata from yt-dlp which is returned as a massive, massive dictionary for a large channel. There's not much optimization that can be done here, other thank make sure the worker frees it properly once processed (which is what background_tasks isn't doing).

Generally I think you're on the right approach, but I would suspect there's going to be a lot of edge cases, weirdness and other issues that will need to be worked out with this transition. I've had two attempts at this myself and never fully finished it enough to merge. If this has any bugs in at all it's going to annoy a lot of people.

perry-mitchell commented 11 months ago

@meeb Not to derail, and perhaps this could be another issue elsewhere, but has anyone reached out to the yt-dlp project regarding this metadata? Surely a JSON stream or something incremental is of better design than a massive blob of data?

Even if it was written to disk incrementally so tubesync could stream it into the DB, this'd be far superior.

It's not sustainable for either project, imo, to function like that. It'll still prevent anyone from scanning popular channels unless they're willing to dedicate 3-6GB of RAM for the instance. I'd have hoped to run such a container at 1-2GB max in a perfect world.

Just my thoughts 😊

meeb commented 11 months ago

I have not, and quite honestly it might be how I've implemented it rather than an upstream issue. Worth checking both ends though! I hacked up the original TubeSync in a weekend to scratch a personal itch so... there may well be legacy silliness that worked fine for the smaller channels I sync but not for people trying to parse gigabytes of metadata. I've not stumbled over it yet but there may be a streaming metadata feature buried in the yt-dlp API.

locke4 commented 11 months ago

Thanks for the effort, @locke4! Specifically for your questions:

Ideally, background_tasks needs to be completely removed. This is blocking upgrading Django beyond 3.2 as well as nesting tasks is going to cause unexpected issues with scheduling and likely cause other more complex problems. This was a major reason it's taken so long. A dual celery/background_tasks implementation sounds pretty nightmarish. Offloading all work to the background is excellent though and the general end game!

Sounds reasonable, however this likely needs locks to prevent race conditions as well as other less common situations. Note anyone using an SQLite backend (which is most people) can only have a single concurrent write and the yt-dlp interacting worker should only run one process at once (any more than that and YouTube starts to get angry) so some concurrency controls are going to be needed.

The memory limit is actually just indexing channel metadata from yt-dlp which is returned as a massive, massive dictionary for a large channel. There's not much optimization that can be done here, other thank make sure the worker frees it properly once processed (which is what background_tasks isn't doing).

Generally I think you're on the right approach, but I would suspect there's going to be a lot of edge cases, weirdness and other issues that will need to be worked out with this transition. I've had two attempts at this myself and never fully finished it enough to merge. If this has any bugs in at all it's going to annoy a lot of people.

Thanks for the comments! On 1., agreed that it's not ideal to have both. At the moment I've been focussed on just fixing the two breaking bugs with large channels which I believe is now complete bar writing test cases and a couple of edge cases.

I'm going to complete this feature branch fully as a fix branch. I'll then work on fully removing background-tasks on a new feature branch, so if I'm unsuccessful at least the fixes can be merged from my current branch and a full refactor completed later building on this. My main issue I haven't solved yet is how to ensure tasks are migrated or recreated in celery on update.

I was thinking that one way of ensuring fully removing background-tasks isn't a breaking change is to force reset tasks on boot in some intelligent way (to avoid an unnecessary task reset).

meeb commented 11 months ago

OK, also remember that if you're not tackling the concurrency and race condition issues initially either lock the celery workers to 1 (and disable the ENV var for worker count) as well! Happy to review what you submit as a PR.

Leopere commented 6 months ago

I have no idea why this is the way it is but this might be related it's been heavily sanitized so hopefully it still contains what's necessary.

 [CRITICAL] WORKER TIMEOUT (pid:351)
 2024/05/13 10:40:52 [error] 341#341: *1 upstream prematurely closed connection while reading response header from upstream, client: 10.0.2.192, server: _, request: "GET / HTTP/1.1", upstream: "http://127.0.0.1:8080/", host: "tubesync.example.com", referrer: "https://login.example.com/"
 10.0.2.192 - - [13/May/2024:10:40:52 -0400] "GET / HTTP/1.1" 502 150 "https://login.example.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
 [ERROR] Worker (pid:351) was sent SIGKILL! Perhaps out of memory?
 [INFO] Booting worker with pid: 361
 10.0.2.192 - - [13/May/2024:10:40:52 -0400] "GET /favicon.ico HTTP/1.1" 302 0 "https://tubesync.example.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
 10.0.2.192 - - [13/May/2024:10:47:53 -0400] "GET / HTTP/1.1" 499 0 "https://login.example.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
 10.0.2.192 - - [13/May/2024:10:48:03 -0400] "GET / HTTP/1.1" 499 0 "https://login.example.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
 [CRITICAL] WORKER TIMEOUT (pid:361)
 [ERROR] Worker (pid:361) was sent SIGKILL! Perhaps out of memory?
 [INFO] Booting worker with pid: 364
 [CRITICAL] WORKER TIMEOUT (pid:350)
 [ERROR] Worker (pid:350) was sent SIGKILL! Perhaps out of memory?
 [INFO] Booting worker with pid: 367
 [CRITICAL] WORKER TIMEOUT (pid:352)
 2024/05/13 10:48:35 [error] 345#345: *8 upstream prematurely closed connection while reading response header from upstream, client: 10.0.2.192, server: _, request: "GET / HTTP/1.1", upstream: "http://127.0.0.1:8080/", host: "tubesync.example.com", referrer: "https://login.example.com/"
 "GET / HTTP/1.1" 502 150 "https://login.example.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
 [ERROR] Worker (pid:352) was sent SIGKILL! Perhaps out of memory?
 [INFO] Booting worker with pid: 370
 [CRITICAL] WORKER TIMEOUT (pid:370)
 2024/05/13 10:51:56 [error] 340#340: *10 upstream prematurely closed connection while reading response header from upstream, client: 10.0.2.192, server: _, request: "GET / HTTP/1.1", upstream: "http://127.0.0.1:8080/", host: "tubesync.example.com", referrer: "https://login.example.com/"
 "GET / HTTP/1.1" 502 150 "https://login.example.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
 [ERROR] Worker (pid:370) was sent SIGKILL! Perhaps out of memory?
 [INFO] Booting worker with pid: 373
 "GET / HTTP/1.1" 200 3889 "https://login.example.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
 "GET /static/styles/tubesync.css HTTP/1.1" 200 34554 "https://tubesync.example.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
 "GET /static/fonts/roboto/roboto-regular.woff HTTP/1.1" 200 22048 "https://tubesync.example.com/static/styles/tubesync.css" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
 "GET /static/fonts/fontawesome/fa-solid-900.woff2 HTTP/1.1" 200 80300 "https://tubesync.example.com/static/styles/tubesync.css" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
 "GET /static/fonts/fontawesome/fa-brands-400.woff2 HTTP/1.1" 200 78460 "https://tubesync.example.com/static/styles/tubesync.css" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
 "GET /static/fonts/fontawesome/fa-regular-400.woff2 HTTP/1.1" 200 13548 "https://tubesync.example.com/static/styles/tubesync.css" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
 "GET /static/fonts/roboto/roboto-bold.woff HTTP/1.1" 200 22284 "https://tubesync.example.com/static/styles/tubesync.css" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"

Leopere commented 6 months ago

Okay I reassessed my settings and in my case I just increased the number of workers from 1 up to 10 because why not.

meeb commented 6 months ago

You should definitely keep your workers at 1. There's no technical reason locally you can't run more workers, however YouTube quickly gets angry at you and blocks your IP or imposes other similar restrictions if you run more than 1 worker.

meeb commented 6 months ago

And thanks for the log, that just shows the worker using a load of memory for some reason and getting killed, however every bit helps.

Leopere commented 6 months ago

You should definitely keep your workers at 1. There's no technical reason locally you can't run more workers, however YouTube quickly gets angry at you and blocks your IP or imposes other similar restrictions if you run more than 1 worker. oh fair enough... good call I'll dial it back to 2 at most then i guess it just seems way more responsive now that I set it higher.

This is what it seemed to be doing as soon as I added other workers.

Thanks for the fast reply @meeb also thanks for the hard work!

More context it asked me to do a ./manage.py migrate etc

I had added these envvars too to my deployment as well.

    environment:
      TUBESYNC_RESET_DOWNLOAD_DIR: "false"
      TZ: America/Toronto
      PUID: "1000"
      PGID: "1000"
      TUBESYNC_WORKERS: 4
      TUBESYNC_DEBUG: "true"
      GUNICORN_WORKERS: "3"
      DJANGO_SECRET_KEY: "REDACTED"

 s6-rc: info: service s6rc-oneshot-runner: starting
 s6-rc: info: service s6rc-oneshot-runner successfully started
 s6-rc: info: service fix-attrs: starting
 s6-rc: info: service tubesync-init: starting
 s6-rc: info: service fix-attrs successfully started
 s6-rc: info: service legacy-cont-init: starting
 s6-rc: info: service legacy-cont-init successfully started
 usermod: no changes
 Operations to perform:
   Apply all migrations: admin, auth, background_task, contenttypes, sessions, sync
 Running migrations:
   No migrations to apply.
   Your models in app(s): 'sync' have changes that are not yet reflected in a migration, and so won't be applied.
   Run 'manage.py makemigrations' to make new migrations, and then re-run 'manage.py migrate' to apply them.
 s6-rc: info: service tubesync-init successfully started
 s6-rc: info: service gunicorn: starting
 s6-rc: info: service gunicorn successfully started
 s6-rc: info: service tubesync-worker: starting
 s6-rc: info: service nginx: starting
 s6-rc: info: service nginx successfully started
 s6-rc: info: service tubesync-worker successfully started
 s6-rc: info: service legacy-services: starting
 s6-rc: info: service legacy-services successfully started
 [2024-05-13 12:29:05 -0400] [314] [INFO] Starting gunicorn 22.0.0
 [2024-05-13 12:29:05 -0400] [314] [INFO] Listening at: http://127.0.0.1:8080 (314)
 [2024-05-13 12:29:05 -0400] [314] [INFO] Using worker: sync
 [2024-05-13 12:29:05 -0400] [348] [INFO] Booting worker with pid: 348
 [2024-05-13 12:29:05 -0400] [349] [INFO] Booting worker with pid: 349
 10.0.2.192 - - [13/May/2024:12:29:15 -0400] "GET /tasks HTTP/1.1" 200 27180 "https://tubesync.example.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
 10.0.2.192 - - [13/May/2024:12:29:16 -0400] "GET /static/styles/tubesync.css HTTP/1.1" 200 34554 "https://tubesync.example.com/tasks" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
 10.0.2.192 - - [13/May/2024:12:29:33 -0400] "GET /source/5b02cd55-4321-4b3f-a866-ee3b35738a6f HTTP/1.1" 200 3545 "https://tubesync.example.com/tasks" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
 10.0.2.192 - - [13/May/2024:12:29:34 -0400] "GET /static/styles/tubesync.css HTTP/1.1" 200 34554 "https://tubesync.example.com/source/5b02cd55-4321-4b3f-a866-ee3b35738a6f" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"

 10.0.2.192 - - [13/May/2024:12:30:36 -0400] "GET /tasks HTTP/1.1" 200 27180 "https://tubesync.example.com/source/5b02cd55-4321-4b3f-a866-ee3b35738a6f" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
 10.0.2.192 - - [13/May/2024:12:30:36 -0400] "GET /static/styles/tubesync.css HTTP/1.1" 200 34554 "https://tubesync.example.com/tasks" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
 10.0.2.192 - - [13/May/2024:12:30:55 -0400] "GET /media HTTP/1.1" 200 13701 "https://tubesync.example.com/tasks" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
 10.0.2.192 - - [13/May/2024:12:30:56 -0400] "GET /static/styles/tubesync.css HTTP/1.1" 200 34554 "https://tubesync.example.com/media" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
 10.0.2.192 - - [13/May/2024:12:30:56 -0400] "GET /media-thumb/69e1637c-7abe-4312-bccc-c31b39341bfa HTTP/1.1" 200 25883 "https://tubesync.example.com/media" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
 10.0.2.192 - - [13/May/2024:12:30:56 -0400] "GET /media-thumb/1197c90b-d475-4640-bb8a-016949dd739e HTTP/1.1" 200 25689 "https://tubesync.example.com/media" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
 10.0.2.192 - - [13/May/2024:12:30:56 -0400] "GET /media-thumb/4cda1a7f-6774-45cf-a0db-0618a6303224 HTTP/1.1" 200 28040 "https://tubesync.example.com/media" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
 10.0.2.192 - - [13/May/2024:12:30:56 -0400] "GET /media-thumb/a3b877f0-9b79-42ca-9d5e-1534ea3216ae HTTP/1.1" 200 17104 "https://tubesync.example.com/media" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
 10.0.2.192 - - [13/May/2024:12:30:56 -0400] "GET /media-thumb/b7629486-9119-4184-8139-8bae7d56771c HTTP/1.1" 200 27827 "https://tubesync.example.com/media" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
 10.0.2.192 - - [13/May/2024:12:30:56 -0400] "GET /media-thumb/145d85f2-dee2-439f-bf26-7303362e5bda HTTP/1.1" 200 18160 "https://tubesync.example.com/media" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
 10.0.2.192 - - [13/May/2024:12:30:56 -0400] "GET /media-thumb/ee8a6c6e-9b60-49a2-adba-7a63907e1c46 HTTP/1.1" 200 22302 "https://tubesync.example.com/media" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
 10.0.2.192 - - [13/May/2024:12:30:56 -0400] "GET /media-thumb/d397d26d-4d5f-40f8-beee-b8a509c0e51c HTTP/1.1" 200 38001 "https://tubesync.example.com/media" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
 10.0.2.192 - - [13/May/2024:12:30:56 -0400] "GET /media-thumb/d38cd7e4-919c-45f2-8130-2fc11bf4bf82 HTTP/1.1" 200 25585 "https://tubesync.example.com/media" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
 10.0.2.192 - - [13/May/2024:12:30:56 -0400] "GET /media-thumb/36ff0af9-c019-4272-8f36-1d167a6d1045 HTTP/1.1" 200 27270 "https://tubesync.example.com/media" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
 10.0.2.192 - - [13/May/2024:12:30:56 -0400] "GET /static/images/nothumb.png HTTP/1.1" 200 4176 "https://tubesync.example.com/media" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
 10.0.2.192 - - [13/May/2024:12:30:56 -0400] "GET /media-thumb/9f3b0d98-d8fb-4046-9797-3cd3f35ebe73 HTTP/1.1" 200 25278 "https://tubesync.example.com/media" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
 10.0.2.192 - - [13/May/2024:12:30:56 -0400] "GET /static/fonts/roboto/roboto-light.woff HTTP/1.1" 200 22264 "https://tubesync.example.com/static/styles/tubesync.css" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
 10.0.2.192 - - [13/May/2024:12:31:00 -0400] "GET / HTTP/1.1" 200 3890 "https://tubesync.example.com/media" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
 10.0.2.192 - - [13/May/2024:12:31:01 -0400] "GET /static/styles/tubesync.css HTTP/1.1" 200 34554 "https://tubesync.example.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"

meeb commented 6 months ago

You can ignore the message to run manage.py migrate - that shows up just because your download path is always different to the one in the default schema and it thinks you need to update it, you don't.

As for your errors, the first once is that playlist wasn't indexable for some reason, try using yt-dlp on the command line with that playlist and see what it returns. Indexable in this case means "tubesync attempted to invoke yt-dlp to get a list of the video IDs in the playlist and this failed". As to why it failed, it can be anything really, the playlist is private and you've not set your cookies.txt, the playlist doesn't exist, YouTube has blocked your IP, your internet connection was down, etc. Basically anything that would cause the index request to fail.

The second error is self-explanatory, tubesync doesn't have permissions to create that directory so check the local directory permissions.

Leopere commented 6 months ago

I have the playlist as unlisted but it shouldn't be private will that cause a problem I just like telling Tubesync to slurp up video I think could be useful in the future which may get taken down for various reasons.

meeb commented 6 months ago

Try a manage.py youtube-dl-info "https://youtube.com/playlist?list=... your playlist ID here ..." and see what is returned. That command will just see if the playlist is indeed index-able.

timwhite commented 4 months ago

@locke4 how did you go with moving things to celery? There will be obvious merge conflicts now as you don't appear to have done much in awhile (https://github.com/meeb/tubesync/compare/main...locke4:tubesync:celery-refactor). Was this mostly working, or are there major issues still?

Can we split this up a little and help each other move it forward? Also which is the right branch you've been working on? Thanks

meeb commented 4 months ago

@timwhite I'm waiting or django-tasks to be merged in the next Django 5 release then going to drop the idea of using celery entirely.

timwhite commented 4 months ago

@timwhite I'm waiting or django-tasks to be merged in the next Django 5 release then going to drop the idea of using celery entirely.

That's good to know. I assume you mean https://github.com/django/deps/blob/main/accepted/0014-background-workers.rst and something like https://github.com/RealOrangeOne/django-tasks being merged into core?

And I assume the advantages are that it's Django native, instead of adding celery and redis?

Knowing that, I may continue to tweak some existing tasks as is, to try and improve how they run, instead of trying to offload them to Celery. I can see some of the places large channels crash, could be handled better with a task handed off to a worker so the UI can return quicker. Any tips on adding more tasks? I think in particular working on moving some of code from source_post_save into a task so saving a source no longer crashes the interface (for large channels).

meeb commented 4 months ago

Yeah, django-tasks being the reference implementation. Something mainline and not requiring a lot of heavy external libraries would be desirable.

This isn't actually related to memory issues, that's to do with the current workers occasionally not freeing memory after invoking yt-dlp (and would be solved by changing the worker implementation, hopefully) which offloading more work to them won't fix. It would make the interface snappier though.

I've just moved a small loop on source_post_save to a background task as an example of how to implement the current background task system if you wanted to try to move any more: https://github.com/meeb/tubesync/commit/b6cec8ab1164034a29e36e4173e073f3b696b7f3

The current task system uses https://django-background-tasks.readthedocs.io/en/latest/ however it's now obviously terribly out of date and doesn't support anything above Django 3.2. It was fine years ago when I hacked up the first version well before anyone else ever used tubesync...

Switching to celery or the new django-tasks is a much bigger job and will need its own issues to track it. The memory usage issue isn't likely to go away, at its peak invoking yt-dlp to index the media on a very large channel is still going to return an absolutely vast JSON object or dict which is always going to use a significant amount of RAM to process. This should spike and then be freed though, not just sat allocated forever in some cases.

You're welcome to attempt to offload other tasks to the current background worker tasks if you want, just be careful of args (which need to be pretty much just strs or ints, serialization isn't particularly advanced) and state race conditions.

All the stuff at the top of tasks.py like https://github.com/meeb/tubesync/blob/main/tubesync/sync/tasks.py#L41 is to account for django-background-tasks not having a built in way to see what's running so this reverse engineers some of its internals to pull out some running tasks data for the /tasks page.

FaySmash commented 4 months ago

As a temporary workaround I added

    deploy:
      resources:
        limits:
          memory: 8G

as a service attribute to the docker-compose and it runs fine so far. 8GB is still a lot but it completely consumed the 32GB of the VM before \^^

meeb / tubesync

Huge memory use #428