Open perry-mitchell opened 1 year ago
Seemed stuck again, so I reset the tasks again.. Memory quickly climbed to my new limit of 6G and it died again:
[2023-10-24 18:43:19 +0300] [319] [CRITICAL] WORKER TIMEOUT (pid:364)
2023/10/24 18:43:19 [error] 346#346: *24 upstream prematurely closed connection while reading response header from upstream, client: 192.168.0.50, server: _, request: "POST /tasks-reset HTTP/1.1", upstream: "http://127.0.0.1:8080/tasks-reset", host: "192.168.0.100:4848", referrer: "http://192.168.0.100:4848/tasks-reset"
[2023-10-24 18:43:19 +0300] [364] [INFO] Worker exiting (pid: 364)
192.168.0.50 - - [24/Oct/2023:18:43:19 +0300] "POST /tasks-reset HTTP/1.1" 502 150 "http://192.168.0.100:4848/tasks-reset" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/118.0"
[2023-10-24 18:43:20 +0300] [319] [ERROR] Worker (pid:364) exited with code 1
[2023-10-24 18:43:20 +0300] [319] [ERROR] Worker (pid:364) exited with code 1.
[2023-10-24 18:43:20 +0300] [432] [INFO] Booting worker with pid: 432
2023-10-24 18:43:20,206 [tubesync/INFO] Using database connection: django.db.backends.mysql://tubesync:[hidden]@MariaDB:3306/tubesync
After this it eventually restarted:
2023-10-24 18:45:00,048 [tubesync/DEBUG] [download] Finished downloading playlist: Markiplier - Videos
After it finished, the memory was almost bang on 100%:
Additionally, my other 2 channels don't show tasks in the schedule, only "markipiler". Even though all are scheduled to index at some point:
How many workers do you have?
How many workers do you have?
Sorry I didn't see this. I've since removed my copy of Tubesync while I plan on migrating it to a new system. So I don't recall exactly what I had configured there. I suspect it was 2 workers, but could have been 4 :man_shrugging:.
Imo there shouldn't be any need to use so much memory - Playlists and channels can be huge so if pulling this data is what's causing the huge use of memory, perhaps it should be written to disk instead of expecting the host to have enough memory to handle any channel size. Databses like MySQL do this - not sure what storage mechanism Tubesync uses under the hood but if it were a DB like this it should rollover onto disk just fine.
There isn't any need for that much memory and it's absolutely a bug. The data collected by the worker can be huge in some cases and that's stored in RAM, however it's not being effectively garbage collected which is related to how the worker library is or isn't working properly. As for temporary spooling of large data to disk, that would be a yt-dlp feature and not a tubesync feature.
Thanks for clarifying! To be clear, is the garbage collection bug on tubesync's side or yt-dlp's?
I'll be setting up a new node probably in the next week or so.. If there's more debugging information I can provide, in case this happens again, please let me know and I'll do my best to share it here.
I'll be running it in an LXC next though, instead of just Docker on Unraid, so the environment will be subtly different.
It's likely in the (now generally unsupported and legacy) background worker library used in current tubesync builds, so neither directly tubesync or yt-dlp. yt-dlp does allocate a lot of memory when asked to index a massive channel, but that's likely not avoidable given the sheer size of the metadata that needs to be processed. The worker library in tubesync has been on the cards to be replaced for... some time now, it's just a lot of work and I've not had time.
The worker library in tubesync has been on the cards to be replaced for... some time now, it's just a lot of work and I've not had time.
Understandable. Anyway, I really appreciate the project so I'll keep on eye on this going forward :)
I can reproduce this issue also. I'm working on a branch this weekend which I'm hoping will be a short-term fix. Only saw the issue when I was running some VMs, which reduced the memory available on my server. I'm hoping setting max_requests_jitter and max_requests on the gunicorn workers might improve this and force each worker to give up it's memory allocations when a large job is being processed by them.
@locke4 this isn't going to be anything to do with gunicorn, it's the background workers which are far more complex to replace.
I've also been having this issue, though I do use TubeSync a little differently then most. I have a single Plex Watch Later playlist that I manually sync in TubeSync when I add new items, and I also use TubeSync with MariaDB and I have 64 GB of RAM. Eventually it gets to the YouTube playlist limit of 4,000 videos and becomes unmanageable in TubeSync.
When I max out or near max out the YouTube playlist TubeSync will use an ungodly amount of system RAM easily over 16 GB of RAM and eventually all of it until the system becomes unresponsive.
I tried creating a new empty playlist and pointing TubeSync at that, and while that did allow the indexing to go quicker Docker would still kill the python3 process for using over 8 GB of RAM the new limit I imposed on the container.
Even though the playlist is empty, all the old playlist items are still stored in sync_media. Which this was expected, but what I noticed was that django is still loading all of these items and taking a ton of ram to do so. I also saw this when loading the /admin panel as well.
I emptied the tables sync_media and background_task_completedtask and TubeSync becomes usable again taking only a few hundred MB of ram.
@locke4 this isn't going to be anything to do with gunicorn, it's the background workers which are far more complex to replace.
Fair enough, you're talking about the Django Background Tasks as the background workers? I'm running a test on a channel which I know breaks tubesync:latest but assuming it won't have any improvements if it's not relating to the gunicorn or celery workers. I have a fix relating to the filter_text behaviour for this PR anyway so good to try and get that completed this weekend regardless.
I want to see if I can look into a fix short of replacing them as I appreciate replacing is a much larger task. For extra info, RAM allocated during indexing also doesn't seem to be dropped which exacerbates this (below is an indexing task which completed around 8am but memory never released, deleting the channel immediately crashes the docker container with SIGKILL out of memory errors). 11am is me updating to my branch and repeating the test.
As expected, doesn't fix the excessive memory usage during a delete operation. I was however able to complete the delete task (though the web browser timed out, the container didn't crash out with a SIGKILL error) which I don't really understand. I found https://github.com/arteria/django-background-tasks/issues/186 which might be promising but forking django-background-tasks isn't the best solution long-term.
Migrated my setup from Unraid to Proxmox, now starting off with a smaller 300 video channel. Seems ok so far:
Once this is done I'll try again with another 5K video channel, see how it likes that.
EDIT: Going for the big one haha. Looks fine so far though. I'm only sync'ing the metadata so far:
EDIT 2: It's ballooned quite a bit, so I've put it up to 4GB overnight, as it scrapes:
EDIT 3: Still seems to be using a crazy amount of RAM, though it's just enqueuing thumbnail downloads, nothing else:
EDIT 4: Memory grows by a gigabyte when I refresh the tasks page, too..
Also, even once a source successfully syncs, I can't open it without it crashing:
Memory climbs to 2-3GB more, then this crash.
Before:
During:
EDIT: So I modified the settings directly in the DB to now enable media download, and clicked refresh in the sources list for "markiplier". Is it normal that it adds all thumbnails to the queue again? Even after they've already been fetched?
Nevermind, it just did it until the cut off time.
@meeb, I want to take a look at this task thing again this weekend. Am I right in thinking your desired replacement is for tasks to be offloaded to a redis queue for celery workers to pick from? I want to make sure I go in the direction that you think is best so any pointers before I restart my feature branch would be really appreciated!
@locke4 Yes, that's the plan. However I should warn you that this is a significant amount of work. General rewrite and refactoring big. Porting the tasks to celery tasks is the easy bit. Migrating all existing tasks (and task management and UI updates on tasks) without breaking anyone's installs is going to be much harder.
Putting my hand up now to be a nightly build tester βπ
Just to mention here that even with rewriting the worker and tasks system this won't really reduce the peak memory requirement much, it'll just mean it gets freed again after use.
I've got a couple of thoughts that'd I'd appreciate your thoughts on @meeb after my prototyping my feature branch on this issue yesterday.
I've got a couple of weird interactions I need to solve before I commit this feature branch to my fork repo but generally I think this approach is a good candidate fix. I should have a PR ready in a week, provided you're happy with this approach
Thanks for the effort, @locke4! Specifically for your questions:
background_tasks
needs to be completely removed. This is blocking upgrading Django beyond 3.2 as well as nesting tasks is going to cause unexpected issues with scheduling and likely cause other more complex problems. This was a major reason it's taken so long. A dual celery/background_tasks implementation sounds pretty nightmarish. Offloading all work to the background is excellent though and the general end game!yt-dlp
interacting worker should only run one process at once (any more than that and YouTube starts to get angry) so some concurrency controls are going to be needed.yt-dlp
which is returned as a massive, massive dictionary for a large channel. There's not much optimization that can be done here, other thank make sure the worker frees it properly once processed (which is what background_tasks
isn't doing).Generally I think you're on the right approach, but I would suspect there's going to be a lot of edge cases, weirdness and other issues that will need to be worked out with this transition. I've had two attempts at this myself and never fully finished it enough to merge. If this has any bugs in at all it's going to annoy a lot of people.
@meeb Not to derail, and perhaps this could be another issue elsewhere, but has anyone reached out to the yt-dlp project regarding this metadata? Surely a JSON stream or something incremental is of better design than a massive blob of data?
Even if it was written to disk incrementally so tubesync could stream it into the DB, this'd be far superior.
It's not sustainable for either project, imo, to function like that. It'll still prevent anyone from scanning popular channels unless they're willing to dedicate 3-6GB of RAM for the instance. I'd have hoped to run such a container at 1-2GB max in a perfect world.
Just my thoughts π
I have not, and quite honestly it might be how I've implemented it rather than an upstream issue. Worth checking both ends though! I hacked up the original TubeSync in a weekend to scratch a personal itch so... there may well be legacy silliness that worked fine for the smaller channels I sync but not for people trying to parse gigabytes of metadata. I've not stumbled over it yet but there may be a streaming metadata feature buried in the yt-dlp API.
Thanks for the effort, @locke4! Specifically for your questions:
- Ideally,
background_tasks
needs to be completely removed. This is blocking upgrading Django beyond 3.2 as well as nesting tasks is going to cause unexpected issues with scheduling and likely cause other more complex problems. This was a major reason it's taken so long. A dual celery/background_tasks implementation sounds pretty nightmarish. Offloading all work to the background is excellent though and the general end game!- Sounds reasonable, however this likely needs locks to prevent race conditions as well as other less common situations. Note anyone using an SQLite backend (which is most people) can only have a single concurrent write and the
yt-dlp
interacting worker should only run one process at once (any more than that and YouTube starts to get angry) so some concurrency controls are going to be needed.- The memory limit is actually just indexing channel metadata from
yt-dlp
which is returned as a massive, massive dictionary for a large channel. There's not much optimization that can be done here, other thank make sure the worker frees it properly once processed (which is whatbackground_tasks
isn't doing).Generally I think you're on the right approach, but I would suspect there's going to be a lot of edge cases, weirdness and other issues that will need to be worked out with this transition. I've had two attempts at this myself and never fully finished it enough to merge. If this has any bugs in at all it's going to annoy a lot of people.
Thanks for the comments! On 1., agreed that it's not ideal to have both. At the moment I've been focussed on just fixing the two breaking bugs with large channels which I believe is now complete bar writing test cases and a couple of edge cases.
I'm going to complete this feature branch fully as a fix branch. I'll then work on fully removing background-tasks on a new feature branch, so if I'm unsuccessful at least the fixes can be merged from my current branch and a full refactor completed later building on this. My main issue I haven't solved yet is how to ensure tasks are migrated or recreated in celery on update.
I was thinking that one way of ensuring fully removing background-tasks isn't a breaking change is to force reset tasks on boot in some intelligent way (to avoid an unnecessary task reset).
OK, also remember that if you're not tackling the concurrency and race condition issues initially either lock the celery workers to 1 (and disable the ENV var for worker count) as well! Happy to review what you submit as a PR.
I have no idea why this is the way it is but this might be related it's been heavily sanitized so hopefully it still contains what's necessary.
[CRITICAL] WORKER TIMEOUT (pid:351)
2024/05/13 10:40:52 [error] 341#341: *1 upstream prematurely closed connection while reading response header from upstream, client: 10.0.2.192, server: _, request: "GET / HTTP/1.1", upstream: "http://127.0.0.1:8080/", host: "tubesync.example.com", referrer: "https://login.example.com/"
10.0.2.192 - - [13/May/2024:10:40:52 -0400] "GET / HTTP/1.1" 502 150 "https://login.example.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
[ERROR] Worker (pid:351) was sent SIGKILL! Perhaps out of memory?
[INFO] Booting worker with pid: 361
10.0.2.192 - - [13/May/2024:10:40:52 -0400] "GET /favicon.ico HTTP/1.1" 302 0 "https://tubesync.example.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
10.0.2.192 - - [13/May/2024:10:47:53 -0400] "GET / HTTP/1.1" 499 0 "https://login.example.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
10.0.2.192 - - [13/May/2024:10:48:03 -0400] "GET / HTTP/1.1" 499 0 "https://login.example.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
[CRITICAL] WORKER TIMEOUT (pid:361)
[ERROR] Worker (pid:361) was sent SIGKILL! Perhaps out of memory?
[INFO] Booting worker with pid: 364
[CRITICAL] WORKER TIMEOUT (pid:350)
[ERROR] Worker (pid:350) was sent SIGKILL! Perhaps out of memory?
[INFO] Booting worker with pid: 367
[CRITICAL] WORKER TIMEOUT (pid:352)
2024/05/13 10:48:35 [error] 345#345: *8 upstream prematurely closed connection while reading response header from upstream, client: 10.0.2.192, server: _, request: "GET / HTTP/1.1", upstream: "http://127.0.0.1:8080/", host: "tubesync.example.com", referrer: "https://login.example.com/"
"GET / HTTP/1.1" 502 150 "https://login.example.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
[ERROR] Worker (pid:352) was sent SIGKILL! Perhaps out of memory?
[INFO] Booting worker with pid: 370
[CRITICAL] WORKER TIMEOUT (pid:370)
2024/05/13 10:51:56 [error] 340#340: *10 upstream prematurely closed connection while reading response header from upstream, client: 10.0.2.192, server: _, request: "GET / HTTP/1.1", upstream: "http://127.0.0.1:8080/", host: "tubesync.example.com", referrer: "https://login.example.com/"
"GET / HTTP/1.1" 502 150 "https://login.example.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
[ERROR] Worker (pid:370) was sent SIGKILL! Perhaps out of memory?
[INFO] Booting worker with pid: 373
"GET / HTTP/1.1" 200 3889 "https://login.example.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
"GET /static/styles/tubesync.css HTTP/1.1" 200 34554 "https://tubesync.example.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
"GET /static/fonts/roboto/roboto-regular.woff HTTP/1.1" 200 22048 "https://tubesync.example.com/static/styles/tubesync.css" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
"GET /static/fonts/fontawesome/fa-solid-900.woff2 HTTP/1.1" 200 80300 "https://tubesync.example.com/static/styles/tubesync.css" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
"GET /static/fonts/fontawesome/fa-brands-400.woff2 HTTP/1.1" 200 78460 "https://tubesync.example.com/static/styles/tubesync.css" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
"GET /static/fonts/fontawesome/fa-regular-400.woff2 HTTP/1.1" 200 13548 "https://tubesync.example.com/static/styles/tubesync.css" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
"GET /static/fonts/roboto/roboto-bold.woff HTTP/1.1" 200 22284 "https://tubesync.example.com/static/styles/tubesync.css" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
Okay I reassessed my settings and in my case I just increased the number of workers from 1 up to 10 because why not.
You should definitely keep your workers at 1. There's no technical reason locally you can't run more workers, however YouTube quickly gets angry at you and blocks your IP or imposes other similar restrictions if you run more than 1 worker.
And thanks for the log, that just shows the worker using a load of memory for some reason and getting killed, however every bit helps.
You should definitely keep your workers at 1. There's no technical reason locally you can't run more workers, however YouTube quickly gets angry at you and blocks your IP or imposes other similar restrictions if you run more than 1 worker. oh fair enough... good call I'll dial it back to 2 at most then i guess it just seems way more responsive now that I set it higher.
This is what it seemed to be doing as soon as I added other workers.
Thanks for the fast reply @meeb also thanks for the hard work!
More context it asked me to do a ./manage.py migrate
etc
I had added these envvars too to my deployment as well.
environment:
TUBESYNC_RESET_DOWNLOAD_DIR: "false"
TZ: America/Toronto
PUID: "1000"
PGID: "1000"
TUBESYNC_WORKERS: 4
TUBESYNC_DEBUG: "true"
GUNICORN_WORKERS: "3"
DJANGO_SECRET_KEY: "REDACTED"
s6-rc: info: service s6rc-oneshot-runner: starting
s6-rc: info: service s6rc-oneshot-runner successfully started
s6-rc: info: service fix-attrs: starting
s6-rc: info: service tubesync-init: starting
s6-rc: info: service fix-attrs successfully started
s6-rc: info: service legacy-cont-init: starting
s6-rc: info: service legacy-cont-init successfully started
usermod: no changes
Operations to perform:
Apply all migrations: admin, auth, background_task, contenttypes, sessions, sync
Running migrations:
No migrations to apply.
Your models in app(s): 'sync' have changes that are not yet reflected in a migration, and so won't be applied.
Run 'manage.py makemigrations' to make new migrations, and then re-run 'manage.py migrate' to apply them.
s6-rc: info: service tubesync-init successfully started
s6-rc: info: service gunicorn: starting
s6-rc: info: service gunicorn successfully started
s6-rc: info: service tubesync-worker: starting
s6-rc: info: service nginx: starting
s6-rc: info: service nginx successfully started
s6-rc: info: service tubesync-worker successfully started
s6-rc: info: service legacy-services: starting
s6-rc: info: service legacy-services successfully started
[2024-05-13 12:29:05 -0400] [314] [INFO] Starting gunicorn 22.0.0
[2024-05-13 12:29:05 -0400] [314] [INFO] Listening at: http://127.0.0.1:8080 (314)
[2024-05-13 12:29:05 -0400] [314] [INFO] Using worker: sync
[2024-05-13 12:29:05 -0400] [348] [INFO] Booting worker with pid: 348
[2024-05-13 12:29:05 -0400] [349] [INFO] Booting worker with pid: 349
10.0.2.192 - - [13/May/2024:12:29:15 -0400] "GET /tasks HTTP/1.1" 200 27180 "https://tubesync.example.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
10.0.2.192 - - [13/May/2024:12:29:16 -0400] "GET /static/styles/tubesync.css HTTP/1.1" 200 34554 "https://tubesync.example.com/tasks" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
10.0.2.192 - - [13/May/2024:12:29:33 -0400] "GET /source/5b02cd55-4321-4b3f-a866-ee3b35738a6f HTTP/1.1" 200 3545 "https://tubesync.example.com/tasks" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
10.0.2.192 - - [13/May/2024:12:29:34 -0400] "GET /static/styles/tubesync.css HTTP/1.1" 200 34554 "https://tubesync.example.com/source/5b02cd55-4321-4b3f-a866-ee3b35738a6f" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
10.0.2.192 - - [13/May/2024:12:30:36 -0400] "GET /tasks HTTP/1.1" 200 27180 "https://tubesync.example.com/source/5b02cd55-4321-4b3f-a866-ee3b35738a6f" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
10.0.2.192 - - [13/May/2024:12:30:36 -0400] "GET /static/styles/tubesync.css HTTP/1.1" 200 34554 "https://tubesync.example.com/tasks" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
10.0.2.192 - - [13/May/2024:12:30:55 -0400] "GET /media HTTP/1.1" 200 13701 "https://tubesync.example.com/tasks" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
10.0.2.192 - - [13/May/2024:12:30:56 -0400] "GET /static/styles/tubesync.css HTTP/1.1" 200 34554 "https://tubesync.example.com/media" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
10.0.2.192 - - [13/May/2024:12:30:56 -0400] "GET /media-thumb/69e1637c-7abe-4312-bccc-c31b39341bfa HTTP/1.1" 200 25883 "https://tubesync.example.com/media" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
10.0.2.192 - - [13/May/2024:12:30:56 -0400] "GET /media-thumb/1197c90b-d475-4640-bb8a-016949dd739e HTTP/1.1" 200 25689 "https://tubesync.example.com/media" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
10.0.2.192 - - [13/May/2024:12:30:56 -0400] "GET /media-thumb/4cda1a7f-6774-45cf-a0db-0618a6303224 HTTP/1.1" 200 28040 "https://tubesync.example.com/media" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
10.0.2.192 - - [13/May/2024:12:30:56 -0400] "GET /media-thumb/a3b877f0-9b79-42ca-9d5e-1534ea3216ae HTTP/1.1" 200 17104 "https://tubesync.example.com/media" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
10.0.2.192 - - [13/May/2024:12:30:56 -0400] "GET /media-thumb/b7629486-9119-4184-8139-8bae7d56771c HTTP/1.1" 200 27827 "https://tubesync.example.com/media" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
10.0.2.192 - - [13/May/2024:12:30:56 -0400] "GET /media-thumb/145d85f2-dee2-439f-bf26-7303362e5bda HTTP/1.1" 200 18160 "https://tubesync.example.com/media" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
10.0.2.192 - - [13/May/2024:12:30:56 -0400] "GET /media-thumb/ee8a6c6e-9b60-49a2-adba-7a63907e1c46 HTTP/1.1" 200 22302 "https://tubesync.example.com/media" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
10.0.2.192 - - [13/May/2024:12:30:56 -0400] "GET /media-thumb/d397d26d-4d5f-40f8-beee-b8a509c0e51c HTTP/1.1" 200 38001 "https://tubesync.example.com/media" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
10.0.2.192 - - [13/May/2024:12:30:56 -0400] "GET /media-thumb/d38cd7e4-919c-45f2-8130-2fc11bf4bf82 HTTP/1.1" 200 25585 "https://tubesync.example.com/media" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
10.0.2.192 - - [13/May/2024:12:30:56 -0400] "GET /media-thumb/36ff0af9-c019-4272-8f36-1d167a6d1045 HTTP/1.1" 200 27270 "https://tubesync.example.com/media" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
10.0.2.192 - - [13/May/2024:12:30:56 -0400] "GET /static/images/nothumb.png HTTP/1.1" 200 4176 "https://tubesync.example.com/media" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
10.0.2.192 - - [13/May/2024:12:30:56 -0400] "GET /media-thumb/9f3b0d98-d8fb-4046-9797-3cd3f35ebe73 HTTP/1.1" 200 25278 "https://tubesync.example.com/media" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
10.0.2.192 - - [13/May/2024:12:30:56 -0400] "GET /static/fonts/roboto/roboto-light.woff HTTP/1.1" 200 22264 "https://tubesync.example.com/static/styles/tubesync.css" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
10.0.2.192 - - [13/May/2024:12:31:00 -0400] "GET / HTTP/1.1" 200 3890 "https://tubesync.example.com/media" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
10.0.2.192 - - [13/May/2024:12:31:01 -0400] "GET /static/styles/tubesync.css HTTP/1.1" 200 34554 "https://tubesync.example.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0"
You can ignore the message to run manage.py migrate
- that shows up just because your download path is always different to the one in the default schema and it thinks you need to update it, you don't.
As for your errors, the first once is that playlist wasn't indexable for some reason, try using yt-dlp
on the command line with that playlist and see what it returns. Indexable in this case means "tubesync attempted to invoke yt-dlp to get a list of the video IDs in the playlist and this failed". As to why it failed, it can be anything really, the playlist is private and you've not set your cookies.txt
, the playlist doesn't exist, YouTube has blocked your IP, your internet connection was down, etc. Basically anything that would cause the index request to fail.
The second error is self-explanatory, tubesync doesn't have permissions to create that directory so check the local directory permissions.
I have the playlist as unlisted but it shouldn't be private will that cause a problem I just like telling Tubesync to slurp up video I think could be useful in the future which may get taken down for various reasons.
Try a manage.py youtube-dl-info "https://youtube.com/playlist?list=... your playlist ID here ..."
and see what is returned. That command will just see if the playlist is indeed index-able.
@locke4 how did you go with moving things to celery? There will be obvious merge conflicts now as you don't appear to have done much in awhile (https://github.com/meeb/tubesync/compare/main...locke4:tubesync:celery-refactor). Was this mostly working, or are there major issues still?
Can we split this up a little and help each other move it forward? Also which is the right branch you've been working on? Thanks
@timwhite I'm waiting or django-tasks to be merged in the next Django 5 release then going to drop the idea of using celery entirely.
@timwhite I'm waiting or django-tasks to be merged in the next Django 5 release then going to drop the idea of using celery entirely.
That's good to know. I assume you mean https://github.com/django/deps/blob/main/accepted/0014-background-workers.rst and something like https://github.com/RealOrangeOne/django-tasks being merged into core?
And I assume the advantages are that it's Django native, instead of adding celery and redis?
Knowing that, I may continue to tweak some existing tasks as is, to try and improve how they run, instead of trying to offload them to Celery. I can see some of the places large channels crash, could be handled better with a task handed off to a worker so the UI can return quicker. Any tips on adding more tasks? I think in particular working on moving some of code from source_post_save
into a task so saving a source no longer crashes the interface (for large channels).
Yeah, django-tasks
being the reference implementation. Something mainline and not requiring a lot of heavy external libraries would be desirable.
This isn't actually related to memory issues, that's to do with the current workers occasionally not freeing memory after invoking yt-dlp
(and would be solved by changing the worker implementation, hopefully) which offloading more work to them won't fix. It would make the interface snappier though.
I've just moved a small loop on source_post_save
to a background task as an example of how to implement the current background task system if you wanted to try to move any more: https://github.com/meeb/tubesync/commit/b6cec8ab1164034a29e36e4173e073f3b696b7f3
The current task system uses https://django-background-tasks.readthedocs.io/en/latest/ however it's now obviously terribly out of date and doesn't support anything above Django 3.2. It was fine years ago when I hacked up the first version well before anyone else ever used tubesync...
Switching to celery or the new django-tasks
is a much bigger job and will need its own issues to track it. The memory usage issue isn't likely to go away, at its peak invoking yt-dlp
to index the media on a very large channel is still going to return an absolutely vast JSON object or dict which is always going to use a significant amount of RAM to process. This should spike and then be freed though, not just sat allocated forever in some cases.
You're welcome to attempt to offload other tasks to the current background worker tasks if you want, just be careful of args (which need to be pretty much just strs or ints, serialization isn't particularly advanced) and state race conditions.
All the stuff at the top of tasks.py
like https://github.com/meeb/tubesync/blob/main/tubesync/sync/tasks.py#L41 is to account for django-background-tasks
not having a built in way to see what's running so this reverse engineers some of its internals to pull out some running tasks data for the /tasks
page.
As a temporary workaround I added
deploy:
resources:
limits:
memory: 8G
as a service attribute to the docker-compose and it runs fine so far. 8GB is still a lot but it completely consumed the 32GB of the VM before \^^
Hello,
I've been debugging my Unraid instance lately, as it's been crashing since I added the TubeSync container on it. I've added some small channels that work fine (~150-200 videos), but adding a 5K+ channel like Markipilier or LTT causes huge amounts of memory to be chewed up.
I found out my system was probably running out of memory, due to it being unconstrained:
So I followed the directions in this issue, and locked the memory to 4G. But now when I try to index any channel, big or small, I get:
Mind you, all channels have been scraped already.. so it shouldn't be doing as big a job again. I'm also using an external MariaDB instance.
Any ideas what I can test next?
EDIT: Extra context:
EDIT 2: I can see the kernel killing the processes at the unraid/OS level: