kieraneglin / pinchflat

Your next YouTube media manager
GNU Affero General Public License v3.0
1.12k stars 21 forks source link

[Triage] Pinchflat is indexing all the sources at the same time #438

Open costaht opened 2 hours ago

costaht commented 2 hours ago

Describe the bug I noticed that as soon as I start the container, Pinchflat starts indexing all the sources simultaneously, and it does this for the entire channel, which consumes a lot of CPU.

To Reproduce Steps to reproduce the behavior:

  1. start the container
  2. run ps -fe | grep pinchflat
  3. observe all the instances of yt-dlp running in the background

Expected behavior I believe Pinchflat should scrape only one channel at a time and only up to the Download Cutoff Date. That can be achieved by using --break-on-reject --dateafter now-3days

kieraneglin commented 2 hours ago

Hey there! Thanks for the report (:

I'd be surprised if this was the case, but it's possible! I have concurrency limited to two workers for indexing and two for downloading for a total of four concurrent yt-dlp operations (see here). You can decrease this to one worker for each (two total) by setting YT_DLP_WORKER_CONCURRENCY=1 but this usually isn't necessary. I can't replicate this locally and I haven't received any other reports of this (despite this logic being unchanged for months), but I would be curious to see your output of ps aux | grep runner.

High CPU usage is often caused by ffmpeg assembling a video, especially if you are using the sponsorblock integration. Within reason this is both expected and unavoidable. Getting back to me with the result of that command above will help!

As for indexing, it is intentional and expected that indexing covers the entire channel and ignores filters (see here). I recognize this isn't strictly the most efficient approach, but as I see it the benefits outweigh the downsides. Besides, indexing itself should use fairly modest resources compared to actually downloading and assembling the videos.

costaht commented 26 minutes ago

I have concurrency limited to two workers

I believe you. I said "all" because two is all I'm using for my tests, so I guess in this case we're both right haha

I would be curious to see your output of ps aux | grep runner

:/app$ ps aux | grep runner
1000         236  0.0  0.0   6332  2176 pts/0    S+   15:37   0:00 grep runner

:/app$ ps aux | grep pinch
1000           1 25.7 14.4 7124572 2300392 ?     Ssl  15:35   0:04 /app/erts-14.2.5/bin/beam.smp -- -root /app -bindir /app/erts-14.2.5/bin -progname erl -- -home / -- -noshell -s elixir start_cli -mode embedded -setcookie WS725S7NPLJZHOWJR6AX7DOFXBG7VGXHHLEJC22GMMGAWF6I6UGQ==== -sname pinchflat -config /app/releases/2024.10.25/sys -boot /app/releases/2024.10.25/start -boot_var RELEASE_LIB /app/lib -- -extra --no-halt
1000         206  0.0  0.0   6932  3328 ?        Ss   15:35   0:00 bash /app/lib/pinchflat-2024.10.25/priv/cmd_wrapper.sh /usr/local/bin/yt-dlp https://youtube.com/@letsdig18/videos --simulate --skip-download --ignore-no-formats-error --no-warnings --print-to-file %(.{id,title,live_status,webpage_url,description,aspect_ratio,duration,upload_date,timestamp,playlist_index})j /tmp/pinchflat/data/f26b3fedbb90b6c3a1a12cc4184739096855490d1c5c3236c1124714c7b8fe17.json --windows-filenames --quiet --cache-dir /tmp/pinchflat/data/yt-dlp-cache
1000         207  0.0  0.0   6932  3328 ?        Ss   15:35   0:00 bash /app/lib/pinchflat-2024.10.25/priv/cmd_wrapper.sh /usr/local/bin/yt-dlp https://youtube.com/@Abom79 --simulate --skip-download --ignore-no-formats-error --no-warnings --print-to-file %(.{id,title,live_status,webpage_url,description,aspect_ratio,duration,upload_date,timestamp,playlist_index})j /tmp/pinchflat/data/a41d5747dac0b3dd42e066bc38953e15973ae50b122aa0af892c7931b44bc08b.json --windows-filenames --quiet --cache-dir /tmp/pinchflat/data/yt-dlp-cache
1000         208 18.8  0.4  74280 66120 ?        S    15:35   0:01 python3 /usr/local/bin/yt-dlp https://youtube.com/@letsdig18/videos --simulate --skip-download --ignore-no-formats-error --no-warnings --print-to-file %(.{id,title,live_status,webpage_url,description,aspect_ratio,duration,upload_date,timestamp,playlist_index})j /tmp/pinchflat/data/f26b3fedbb90b6c3a1a12cc4184739096855490d1c5c3236c1124714c7b8fe17.json --windows-filenames --quiet --cache-dir /tmp/pinchflat/data/yt-dlp-cache
1000         209  0.0  0.0   6932  1312 ?        S    15:35   0:00 bash /app/lib/pinchflat-2024.10.25/priv/cmd_wrapper.sh /usr/local/bin/yt-dlp https://youtube.com/@letsdig18/videos --simulate --skip-download --ignore-no-formats-error --no-warnings --print-to-file %(.{id,title,live_status,webpage_url,description,aspect_ratio,duration,upload_date,timestamp,playlist_index})j /tmp/pinchflat/data/f26b3fedbb90b6c3a1a12cc4184739096855490d1c5c3236c1124714c7b8fe17.json --windows-filenames --quiet --cache-dir /tmp/pinchflat/data/yt-dlp-cache
1000         210  0.0  0.0   6932  1444 ?        S    15:35   0:00 bash /app/lib/pinchflat-2024.10.25/priv/cmd_wrapper.sh /usr/local/bin/yt-dlp https://youtube.com/@letsdig18/videos --simulate --skip-download --ignore-no-formats-error --no-warnings --print-to-file %(.{id,title,live_status,webpage_url,description,aspect_ratio,duration,upload_date,timestamp,playlist_index})j /tmp/pinchflat/data/f26b3fedbb90b6c3a1a12cc4184739096855490d1c5c3236c1124714c7b8fe17.json --windows-filenames --quiet --cache-dir /tmp/pinchflat/data/yt-dlp-cache
1000         211 18.1  0.4  74320 66484 ?        S    15:35   0:01 python3 /usr/local/bin/yt-dlp https://youtube.com/@Abom79 --simulate --skip-download --ignore-no-formats-error --no-warnings --print-to-file %(.{id,title,live_status,webpage_url,description,aspect_ratio,duration,upload_date,timestamp,playlist_index})j /tmp/pinchflat/data/a41d5747dac0b3dd42e066bc38953e15973ae50b122aa0af892c7931b44bc08b.json --windows-filenames --quiet --cache-dir /tmp/pinchflat/data/yt-dlp-cache
1000         212  0.0  0.0   6932  1312 ?        S    15:35   0:00 bash /app/lib/pinchflat-2024.10.25/priv/cmd_wrapper.sh /usr/local/bin/yt-dlp https://youtube.com/@Abom79 --simulate --skip-download --ignore-no-formats-error --no-warnings --print-to-file %(.{id,title,live_status,webpage_url,description,aspect_ratio,duration,upload_date,timestamp,playlist_index})j /tmp/pinchflat/data/a41d5747dac0b3dd42e066bc38953e15973ae50b122aa0af892c7931b44bc08b.json --windows-filenames --quiet --cache-dir /tmp/pinchflat/data/yt-dlp-cache
1000         213  0.0  0.0   6932  1444 ?        S    15:35   0:00 bash /app/lib/pinchflat-2024.10.25/priv/cmd_wrapper.sh /usr/local/bin/yt-dlp https://youtube.com/@Abom79 --simulate --skip-download --ignore-no-formats-error --no-warnings --print-to-file %(.{id,title,live_status,webpage_url,description,aspect_ratio,duration,upload_date,timestamp,playlist_index})j /tmp/pinchflat/data/a41d5747dac0b3dd42e066bc38953e15973ae50b122aa0af892c7931b44bc08b.json --windows-filenames --quiet --cache-dir /tmp/pinchflat/data/yt-dlp-cache

High CPU usage is often caused by ffmpeg assembling a video

Actually these sources have Download Media disabled

Screenshot from 2024-10-26 15-59-52

I see it the benefits outweigh the downsides

I respect that

Ok then, I'll keep going with my tests and I'll let you know if I find anything odd. It would be nice if we had a chat (irc, telegram or discord) where we users could talk about their experiences and stop cluttering the repo with issues that are actually misunderstandings.