ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
132.7k stars 10.07k forks source link

youtube-dl refuses to download if there's a file already present #32783

Open matteyas opened 6 months ago

matteyas commented 6 months ago

Checklist

Description

I tried to download audio (webm) with youtube-dl. Worked fine. Then tried to download video+audio (webm) and youtube-dl refuses, even though it's not the same file.

I have noticed two possible fixes, and I figure both would be good to have: 1) Implement --force-download 2) Make overwrite default if it's a different file (compare size)

(The justification for (2) is that --skip-download and --simulate are available.)

Verbose log

D:\> youtube-dl --version
2024.04.22

D:\> youtube-dl -F https://www.youtube.com/watch?v=ynPdORrsuZM
[youtube] ynPdORrsuZM: Downloading webpage
[info] Available formats for ynPdORrsuZM:
format code  extension  resolution note
249          webm       audio only audio_quality_low   54k , webm_dash container, opus  (48000Hz), 1.27MiB
250          webm       audio only audio_quality_low   72k , webm_dash container, opus  (48000Hz), 1.70MiB
140          m4a        audio only audio_quality_medium  129k , m4a_dash container, mp4a.40.2 (44100Hz), 3.03MiB
251          webm       audio only audio_quality_medium  142k , webm_dash container, opus  (48000Hz), 3.33MiB
160          mp4        256x108    144p   41k , mp4_dash container, avc1.4d400b, 25fps, video only, 995.16KiB
278          webm       256x108    144p   72k , webm_dash container, vp9, 25fps, video only, 1.69MiB
133          mp4        426x182    240p   97k , mp4_dash container, avc1.4d400d, 25fps, video only, 2.28MiB
242          webm       426x182    240p  125k , webm_dash container, vp9, 25fps, video only, 2.95MiB
134          mp4        640x272    360p  174k , mp4_dash container, avc1.4d4015, 25fps, video only, 4.08MiB
243          webm       640x272    360p  281k , webm_dash container, vp9, 25fps, video only, 6.59MiB
135          mp4        854x364    480p  316k , mp4_dash container, avc1.4d401e, 25fps, video only, 7.40MiB
244          webm       854x364    480p  424k , webm_dash container, vp9, 25fps, video only, 9.94MiB
136          mp4        1280x544   720p  547k , mp4_dash container, avc1.4d401f, 25fps, video only, 12.82MiB
247          webm       1280x544   720p  782k , webm_dash container, vp9, 25fps, video only, 18.31MiB
248          webm       1920x816   1080p 1301k , webm_dash container, vp9, 25fps, video only, 30.47MiB
137          mp4        1920x816   1080p 1926k , mp4_dash container, avc1.640028, 25fps, video only, 45.10MiB
271          webm       2560x1088  1440p 5108k , webm_dash container, vp9, 25fps, video only, 119.61MiB
313          webm       3840x1634  2160p 12595k , webm_dash container, vp9, 25fps, video only, 294.89MiB
18           mp4        640x272    360p  487k , avc1.42001E, 25fps, mp4a.40.2 (44100Hz), 11.41MiB (best)

D:\> del "French 79 - TEENAGERS [Official Video]-ynPdORrsuZM.webm"

D:\> youtube-dl -f 251 https://www.youtube.com/watch?v=ynPdORrsuZM
[youtube] ynPdORrsuZM: Downloading webpage
[dashsegments] Total fragments: 1
[download] Destination: French 79 - TEENAGERS [Official Video]-ynPdORrsuZM.webm
[download] 100% of 3.33MiB in 00:00

D:\> youtube-dl -f 313 https://www.youtube.com/watch?v=ynPdORrsuZM
[youtube] ynPdORrsuZM: Downloading webpage
[download] French 79 - TEENAGERS [Official Video]-ynPdORrsuZM.webm has already been downloaded
[download] 100% of 3.33MiB
dirkf commented 6 months ago

If you expect that the file you are downloading should replace an existing file or file group, you have to say so (see below for how), or manually delete or move the unwanted file(s). Not re-downloading content and not overwriting potentially "precious" content are design features of the program so that you can repeatedly invoke the program to download further metadata without affecting what has already been saved.

You can specify an output template containing %(format_id)s if you want to discriminate between different formats of the same media item, and then delete any unwanted formats and/or rename a preferred item.

Somewhat unintuitively, if you want to force overwriting with the same output template, you have to specify --no-continue as well as not specifying --no-overwrites (off by default).

yt-dlp has implemented a slightly improved, I think, set of options:

-    -w, --no-overwrites                  Do not overwrite files
+    -w, --no-overwrites             Do not overwrite any files
+    --force-overwrites              Overwrite all video and metadata files. This
+                                    option includes --no-continue
+    --no-force-overwrites           Do not overwrite the video, but overwrite
+                                    related files (default)

But, unless a lot of people ask for this to be back-ported, or someone actually offers a PR, it's not likely to be implemented here.

matteyas commented 6 months ago

If you expect that the file you are downloading should replace an existing file or file group, you have to say so (see below for how), or manually delete or move the unwanted file(s). Not re-downloading content and not overwriting potentially "precious" content are design features of the program so that you can repeatedly invoke the program to download further metadata without affecting what has already been saved.

You can specify an output template containing %(format_id)s if you want to discriminate between different formats of the same media item, and then delete any unwanted formats and/or rename a preferred item.

Somewhat unintuitively, if you want to force overwriting with the same output template, you have to specify --no-continue as well as not specifying --no-overwrites (off by default).

yt-dlp has implemented a slightly improved, I think, set of options:

-    -w, --no-overwrites                  Do not overwrite files
+    -w, --no-overwrites             Do not overwrite any files
+    --force-overwrites              Overwrite all video and metadata files. This
+                                    option includes --no-continue
+    --no-force-overwrites           Do not overwrite the video, but overwrite
+                                    related files (default)

But, unless a lot of people ask for this to be back-ported, or someone actually offers a PR, it's not likely to be implemented here.

I see. I was unaware of the dlp alternative, so that's good to know. I'd agree that the manual force switch is rather weird; had I known that there was a possibility to get around the issue I wouldn't have posted here, so maybe at least add some info about how to do it? (Maybe there is info, I'm just a bit impatient when --help produces several pages so I didn't find it at a glance. I'll ask some LLM to interpret the --help next time.)

mk-pmb commented 6 months ago

Make overwrite default if it's a different file

Sounds like a rather bad idea. Rather, we should do it like wget and insert a number before the file name extension unless users explicitly request destructive action.

Ideally we could offer something like %(format_id)s that inserts itself (and a separator befor/after) only if the file without this name part already exists and is non-empty. However, I see that about every detail in the previous sentence strays far from the current flow, and would open entire new cans of worms.

dirkf commented 6 months ago

Exactly: there are so many features in the file naming code, even more so with yt-dlp, that changing anything has the danger of unwanted feature interaction: in fact this has already happened.

matteyas commented 6 months ago

Make overwrite default if it's a different file

Sounds like a rather bad idea.

Huh, I don't get that myself. If I happen to overwrite something I didn't want to—and I think I could count how often this has happened on one hand for my entire life, but perhaps that is unusual—I'd just… overwrite it again? (Which would be pretty quick since it'd be default behavior.) :D

But anyway, default behavior in Windows is to prompt. "Are you 100% certain you want to overwrite the precious file that is already there?" :)

(Again, I'm just throwing my cents around; I understand if this is more or less a non-issue in the grand scheme of things.)

mk-pmb commented 6 months ago

default behavior in Windows is to prompt.

Which is also bad. Forces users to make a decision between right, destructive wrong or time-wasting wrong (the latter being taking their time to think). The wget evasion strategy is superior because it can immediately start doing what the user actually wanted (transfering bytes) while the user has lots of time to decide what name those bytes shall be known as.

matteyas commented 6 months ago

immediately start doing what the user actually wanted (transfering bytes)

I would love that! See initial post. :)

Regarding prompts and time wasting, if it starts cluttering my disk with video (1).mp4 files that I have to manually deal with afterwards, how is that not at least as big of a time sink as a binary decision followed by pressing y or n and enter?

(As an addendum, I'd suggest that this issue with conflicting filenames is quite rare in regular usage of the tool, and also, overwrites are basically completely safe in this context. Maybe some people disagree with those assumptions, and that's why I don't grasp this being such a contended issue?)

mk-pmb commented 6 months ago

that I have to manually deal with afterwards,

The "afterwards" is the key point. On good file systems, you can rename while the file is being downloaded. Thus, the time that you need to make your decision, does not add to the download time. Even on bad file systems, the main advantage is the lack of time pressure for the decision, which stochastically helps avoid user error. Same reason why it's better to offer easy undo rather than lots of confirmations that users get used to blindly confirming.

mk-pmb commented 6 months ago

overwrites are basically completely safe in this context.

How so? Imagine a 4 hour lecture. My VLC is playing it, currently at 3 hours in. Now yt-dl starts overwriting the file. Let's assume it only starts doing so once it has the first MB buffered. With my usual YouTube download rates I'll have to wait at least an hour until I can continue watching. It gets worse…

matteyas commented 6 months ago

The "afterwards" is the key point. On good file systems, you can rename while the file is being downloaded. Thus, the time that you need to make your decision, does not add to the download time. Even on bad file systems, the main advantage is the lack of time pressure for the decision, which stochastically helps avoid user error. Same reason why it's better to offer easy undo rather than lots of confirmations that users get used to blindly confirming.

Huh, I was unaware that that's a problem people face on a regular basis. File confirmation anxiety. Blind confirmation of file overwrites. Taking significant time to decide on filenames.

Could you give me some numbers on how often this occurs in the real world? I can't remember it ever occurring for me personally, and I have some trouble imagining that others have to struggle much more than someone as foolish as myself.

How so?

Because it's just a video file, and because it almost never happens. If that seems like a controversial take, I'd love some statistics on how often someone is watching something and decide to redownload the same video while doing so.

Happily, though, if I ever were to screw up to that extent, my "bad file system" would save me since VLC locks the file while it's playing.

Anyway, we clearly have pretty different problems and ways to deal with them—from my perspective it is extremely obvious that a video downloader is a completely safe context for overwrites—so I'll disengage if future discourse seems fruitless.

mk-pmb commented 6 months ago

Huh, I was unaware that that's a problem people face on a regular basis.

Yeah, people are usually good at avoiding decision fatigue, and even when they cannot, most don't experience it consciously or misattribute the symptoms. I'm probably a bit more affected because my work and leisure both include activities where sometimes I have to spend hours on end making decision after decision, and even so I went through most of my life without knowing there was a name for it. Science also could use more research on what factors besides placebo make people more/less prone to the phenomenon.

Could you give me some numbers on how often this occurs in the real world?

No, I'm not that deep into psychology.

I'll disengage if future discourse seems fruitless.

Is it even still an issue? As far as I understood @dirkf above, you can achieve the desired behavior by adding --no-continue to the CLI arguments, or (according to readme) to your default options config file (%APPDATA%/youtube-dl/config.txt on Windows). Once you do that, yt-dl will overwrite by default, i.e. unless you use --no-overwrites. I'll file a PR to clarify the --help and readme about that.

matteyas commented 5 months ago

Is it even still an issue? As far as I understood @dirkf above, you can achieve the desired behavior by adding --no-continue to the CLI arguments, or (according to readme) to your default options config file (%APPDATA%/youtube-dl/config.txt on Windows). Once you do that, yt-dl will overwrite by default, i.e. unless you use --no-overwrites. I'll file a PR to clarify the --help and readme about that.

No issue for me. Thanks for the suggestion, I was unaware of the configuration file; though I updated my macros to include the flag anyway. Updating the --help seems useful, since the --no-continue flag entry doesn't mention forcing overwrites.

dirkf commented 5 months ago

It's not so much that --no-continue forces overwrites; rather the desirable default behaviour where an incomplete download is restarted from the point of failure (say, when 2.4 of 2.5GB had been fetched before whatever error) is incompatible with the other default of overwriting downloaded files. So to overwrite files you have to give up --continue.

You might wonder what happens with --no-overwrites and no-continue together. This is an exercise for the reader, though not a Knuth-style exercise whose solution needs a term paper, I hope.

mk-pmb commented 1 month ago

What can I help to move this forward? (Have you seen my PR?)

matteyas commented 1 month ago

It's not so much that --no-continue forces overwrites; rather the desirable default behaviour where an incomplete download is restarted from the point of failure (say, when 2.4 of 2.5GB had been fetched before whatever error) is incompatible with the other default of overwriting downloaded files. So to overwrite files you have to give up --continue.

The only thing that's still weirding me out is that there's no --overwrite switch. It's an extremely common operator, and you have to use this "workaround" switch to replicate it; which puts some demand on the user knowing the inner workings described by dirkf in the quote.

Highly suspect design choice, in my view. I'm guessing someone is thinking "oh wow, that would actually be code reuse!" or some similar bane of reason.

Besides that, there's no issue as far as I can tell. Feel free to close the issue, or tell me to do it. ^^

(Disclaimer: I did not look at mk-pmb's PR, so if that adds the switch in question, great.)

mk-pmb commented 1 month ago

which puts some demand on the user knowing the inner workings described by dirkf in the quote.

As far as I understand, ytdl has little priority on being intuitive, because the role of a user-centric frontend is better served by third-party projects already.

My PR intends to make it easier to obtain the required knowledge, by hopefully making it easier to find the required flags when you search the readme for how to overwrite a file. Since you are the only tester we can easily call to help for this issue, I would love if you could comment on whether this new readme would have helped you

matteyas commented 1 month ago

My PR intends to make it easier to obtain the required knowledge, by hopefully making it easier to find the required flags when you search the readme for how to overwrite a file. Since you are the only tester we can easily call to help for this issue, I would love if you could comment on whether this new readme would have helped you

It would probably help, even though it would still rely on me knowing that the readme is easily available online, since I'd have to use ctrl+f to find it.

An actual --overwrite flag (that acts as a synonym for --no-continue) would likely be easier to spot in the console --help section, which means I'd be fine without knowing about the online readme.

A minor issue, but you wanted my feedback so I hope it's not perceived as nagging. Having the behavior spelled out for --no-continue is an improvement.

mk-pmb commented 1 month ago

The changes from the PR will also be in --help once this is merged. (Because the relevant section of the online readme is actually based on --help.) It still won't be on the left side, but in youtube-dl --help | grep -i overwrite you should see it.