tropicoo / yt-dlp-bot

Self-hosted Video Download Telegram Bot πŸ‡ΊπŸ‡¦
BSD 3-Clause "New" or "Revised" License
218 stars 73 forks source link

Download aborts and error causes lost Data #212

Closed ronny80 closed 11 months ago

ronny80 commented 11 months ago

I sent https://www.youtube.com/watch?v=yT4A47IFu2E to my telegram-bot.

But it was not able to download the complete File and got following Error-Message:

`πŸ›‘ Download error

ℹ️ Task ID: c3e51785-85eb-4f2a-bb83-0aa068a8712f πŸ’¬ Message: Download error πŸ“Ή Video URL: https://www.youtube.com/watch?v=yT4A47IFu2E 🌊 Source: BOT πŸ‘€ Details: 'NoneType' object is not subscriptable ⬇️ yt-dlp version: 2023.11.16 πŸ€– yt-dlp-bot version: 1.4.4 🏷️ Tag: #error `

After having a look at the Log-files and into the shared-tmpfs folder, I assume that the Download aborted and restartet itself again. The content in shared-tmpfs/downloading was empty, but in shared-tmpfs/downloaded were 11 generated folders (4 characters long) with timestamps from 03:51 to 08:42.

the 4char-Folders were partly empty and partly filled with the videofile and the thumbnail. The Videos were never fully downloaded but everytime only partly downloaded with filesizes from 11197MB, 11142MB, 11127MB, 11337MB, 11248MB, 11161MB, 11320MB. So a huge part was downloaded but then dhe download aborted.

My problem is, that the folders were not deleted, so there are a lot of lost files lying around and I think if the Download aborts, it is possible to restart the download into the last destinationfolder and the download resumes automatically. That seems not to be the case. Am I right?

It also seems that the downloads of the same file are running in parallel. the following CLI-log let me assume that. Percentages are jumping around.

[download] 48.1% of ~ 10.78GiB at 1.23MiB/s ETA 01:17:46 (frag 461/961) [download] 0.9% of ~ 10.83GiB at 1.46MiB/s ETA 02:10:45 (frag 8/961)1) [download] 1.0% of ~ 11.51GiB at 1.45MiB/s ETA 02:16:33 (frag 463/961)frag 462/961) [download] 27.6% of ~ 10.76GiB at 1.16MiB/s ETA 01:45:51 (frag 264/961) [download] 27.6% of ~ 10.82GiB at 1.65MiB/s ETA 59:10 (frag 466/961)61) [download] 48.8% of ~ 10.80GiB at 1.54MiB/s ETA 01:01:03 (frag 468/961) [download] 27.8% of ~ 10.87GiB at 1.61MiB/s ETA 01:21:47 (frag 266/961)61) [download] 28.1% of ~ 10.81GiB at 1.43MiB/s ETA 01:31:29 (frag 269/961) [download] 28.3% of ~ 10.78GiB at 1.47MiB/s ETA 01:29:28 (frag 271/961) [download] 49.3% of ~ 10.82GiB at 1.72MiB/s ETA 01:19:06 (frag 273/961)1) [download] 9.4% of ~ 10.84GiB at 1.48MiB/s ETA 01:04:25 (frag 474/961) [download] 28.7% of ~ 10.81GiB at 1.56MiB/s ETA 01:25:09 (frag 275/961)) [download] 49.8% of ~ 10.81GiB at 1.22MiB/s ETA 01:16:07 (frag 278/961)61) [download] 29.0% of ~ 10.82GiB at 1.42MiB/s ETA 01:04:37 (frag 478/961) [download] 29.2% of ~ 10.80GiB at 1.46MiB/s ETA 01:29:59 (frag 280/961) [download] 2.9% of ~ 11.52GiB at 1.33MiB/s ETA 02:21:36rag 482/961)61)

in the CLI-log there are also following entries [download] Got error: HTTPSConnectionPool(host='rr1---sn-n5a1c0gb-n1bl.googlevideo.com', port=443): Read timed out. [download] Got error: HTTPSConnectionPool(host='rr1---sn-n5a1c0gb-n1bl.googlevideo.com', port=443): Read timed out. [download] Got error: HTTPSConnectionPool(host='rr1---sn-n5a1c0gb-n1bl.googlevideo.com', port=443): Read timed out. [download] Got error: HTTPSConnectionPool(host='rr1---sn-n5a1c0gb-n1bl.googlevideo.com', port=443): Read timed out. [download] Got error: HTTPSConnectionPool(host='rr1---sn-n5a1c0gb-n1bl.googlevideo.com', port=443): Read timed out. [download] Got error: HTTPSConnectionPool(host='rr1---sn-n5a1c0gb-n1bl.googlevideo.com', port=443): Read timed out. [download] Got error: HTTPSConnectionPool(host='rr1---sn-n5a1c0gb-n1bl.googlevideo.com', port=443): Read timed out. maybe google stops answering because of the amount of same downloads?

for this downloadlink the error in my case is reproduceable every time.

Thanks

tropicoo commented 11 months ago

My problem is, that the folders were not deleted, so there are a lot of lost files lying around

I've improved the error handling. If you update to the latest version, this shouldn't be the case anymore.

I think if the Download aborts, it is possible to restart the download into the last destinationfolder and the download resumes automatically

It doesn't restart the download. If it fails, it fails :)

It also seems that the downloads of the same file are running in parallel.

Yes, currently it's hardcoded to the 5 download threads per file here


Basically, you should see all error messages in the logs to understand what really happened.

tropicoo commented 11 months ago

Fixed in d1e4089fb8a4476167730d3ffe5790f4362effe0

tropicoo commented 11 months ago

In my case, it failed because there was no space left: OSError: [Errno 28] No space left on device.

In the docker-compose.yml file the volume is configured to have 7GB (7168m).

volumes:
  pgdata:
  shared-tmpfs:
    driver: local
    driver_opts:
      type: "tmpfs"
      device: "tmpfs"
      o: "size=7168m,uid=1000"  <---------------- Here

If you have the same error, stop the bot, delete the volume, change the value in the docker-compose.yml to bigger, and start the bot. For example, your downloaded separate video has 5GB and audio 800MB. The bot needs to merge both files into one final video file. 5GB + 5GB = 10GB which is larger than 7 and will error out.

ronny80 commented 11 months ago

Thank you for your Info. the value of my docker-compose.yml-file is

#      o: "size=7168m,uid=1000"
      o: "size=71680m,uid=1000"
tropicoo commented 11 months ago

@ronny80 Also, if you upload your videos back to Telegram, remember about their limitations for free and premium users (2 vs 4 GB per file)

ronny80 commented 11 months ago

The TG-Upload-Limit I had in my head, but the error occured while downloading, and every time at 11GB.

Also interresting was, that the folder "downloading" was empty and all defect files where moved into "downloaded" folder.

I know that the download sometimes fails from youtube. but yt-dlp on cli lets you restart the download and resumes itsself.

tropicoo commented 11 months ago

Also interresting was, that the folder "downloading" was empty and all defect files where moved into "downloaded" folder

That's because of how the logic was working. Basically, yt-dlp downloads the media files (video, audio, thumbnail if exists) and merges them into final video if needed (from YouTube, for example, where you have different video and audio streams). The one downside of the yt-dlp is that I need to suppress the errors from it so when it fails to merge the video because of no space left, it exits as no errors occurred, and the content of the downloading/tmp-dir/* dir was moved into downloaded/4-char-dir.

Now it won't move anything if an error occurs on yt-dlp side.

I know that the download sometimes fails from youtube. but yt-dlp on cli lets you restart the download and resumes itself.

Yes, because some parts of the files were already downloaded. But in the bot, these files are stored in a temporary directory during the downloading process and if something fails, that directory should be deleted with all content.

ronny80 commented 11 months ago

Thanks for your fast update. At the Moment I am testing with the appropriate yt-link.

there is no TG-Message of a failure at the moment, but what still exists is, that the progress on the cli-log jumps around like there are more than one downloads.

I attached the cli-log as a file. yt-dlp-cliLog.txt

line 425 Download is starting line 437: shows the temp dest-filder for this download.

line 527: timeout from rabbitmq

line 532: a new seems to start line 544: a new temp downloadfolder is shown.

until line 564: ascending progress values 0-64,4% from line 565: now progress alternates

Old download seems to continue while a new download of the same file has started.

fyi: TG-Chat-ID has been changed to 123456789

tropicoo commented 11 months ago

I think I've understood what's the reason of such behavior. Please update again.

ronny80 commented 11 months ago

Test finished positive.

Many thanks