ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
130k stars 9.8k forks source link

Embedding thumbnail resets mtime #18915

Open MrDOS opened 5 years ago

MrDOS commented 5 years ago

Please follow the guide below


Make sure you are using the latest version: run youtube-dl --version and ensure your version is 2019.01.17. If it's not, read this FAQ entry and update. Issues with outdated version will be rejected.

Before submitting an issue make sure you have:

What is the purpose of your issue?


If the purpose of this issue is a bug report, site support request or you are not completely sure provide the full verbose output as follows:

Add the -v flag to your command line you run youtube-dl with (youtube-dl -v <your command line>), copy the whole output and insert it here. It should look similar to one below (replace it with your log inserted between triple ```):

[debug] System config: []
[debug] User config: ['-f', 'bestvideo[ext=mp4]+bestaudio[ext=m4a]/mp4', '--embed-thumbnail', '--add-metadata', '-o', '%(title)s.%(ext)s']
[debug] Custom config: []
[debug] Command-line args: ['-v']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2019.01.17
[debug] Python version 3.7.2 (CPython) - Linux-4.19.0-1-amd64-x86_64-with-debian-buster-sid
[debug] exe versions: ffmpeg 4.1-1, ffprobe 4.1-1, rtmpdump 2.4
[debug] Proxy map: {}

Description of your issue, suggested solution and other information

When downloading a video, the default behaviour of youtube-dl is to set the modification time (mtime) of the output file to the time indicated by the Last-Modified response header of... something. When downloading multiple streams, it isn't clear which modification time is used. Regardless, this works fine, even when combining multiple streams:

$ youtube-dl --ignore-config -f bestvideo[ext=mp4]+bestaudio[ext=m4a]/mp4 https://www.youtube.com/watch?v=dQw4w9WgXcQ
...
[ffmpeg] Merging formats into "Rick Astley - Never Gonna Give You Up (Video)-dQw4w9WgXcQ.mp4"
...
$ stat Rick\ Astley\ -\ Never\ Gonna\ Give\ You\ Up\ \(Video\)-dQw4w9WgXcQ.mp4
...
Modify: 2018-12-09 21:59:12.000000000 -0500

(The video was originally uploaded in 2009, but the incorrect date appears to be the fault of YouTube, not youtube-dl.)

youtube-dl includes a handy feature, --embed-thumbnail, which additionally retrieves a JPEG thumbnail for the video and embeds it into the output file. Unfortunately, this step is performed via an external tool (AtomicParsley), and happens after youtube-dl backdates the mtime of the output file. This means that the final mtime of the output file is ultimately set to the current time (by the system, because AtomicParsley modifies the file), not the modification time of the video as indicated by the source.

$ youtube-dl --ignore-config -f bestvideo[ext=mp4]+bestaudio[ext=m4a]/mp4 --embed-thumbnail https://www.youtube.com/watch?v=dQw4w9WgXcQ
...
$ stat Rick\ Astley\ -\ Never\ Gonna\ Give\ You\ Up\ \(Video\)-dQw4w9WgXcQ.mp4
...
Modify: 2019-01-18 14:06:10.201390415 -0500

I suspect the correct fix here is to perform mtime adjustment of the output file at the very last minute, after all other possible options applying to the single output file have been handled.

jtmoon79 commented 5 years ago

Seeing a similar problem. I've found if I pass any of the following options then the LastWriteTime property (according to powershell) or the Date modified column (within Windows Explorer) is not updated to the video's original release time: --xattrs, --add-metadata, or --embed-thumbnail.

With any of those options (--xattrs, --add-metadata, or --embed-thumbnail) passed to youtube-dl, the datetime is mistakenly set to the datetime of when writing to the file is completed. Without those options, the LastWriteTime (or Date modified) is set to the youtube video's Published datetime.

Successful time set

Here's an example of the success case: (LastWriteTime is set to video's Published datetime).

PS> C:\Python\Python37\Scripts\youtube-dl -o "%(upload_date)s %(title)s (%(id)s).%(ext)s" 2AFq2rti3-4
[youtube] 2AFq2rti3-4: Downloading webpage
[youtube] 2AFq2rti3-4: Downloading video info webpage
[download] Destination: 20190211 NASA Finds Second Massive Greenland Crater (2AFq2rti3-4).f137.mp4
[download] 100% of 13.69MiB in 00:01
[download] Destination: 20190211 NASA Finds Second Massive Greenland Crater (2AFq2rti3-4).f140.m4a
[download] 100% of 862.13KiB in 00:00
[ffmpeg] Merging formats into "20190211 NASA Finds Second Massive Greenland Crater (2AFq2rti3-4).mp4"
Deleting original file 20190211 NASA Finds Second Massive Greenland Crater (2AFq2rti3-4).f137.mp4 (pass -k to keep)
Deleting original file 20190211 NASA Finds Second Massive Greenland Crater (2AFq2rti3-4).f140.m4a (pass -k to keep)

PS> ls

Mode                LastWriteTime         Length Name
----                -------------         ------ ----
-a----       2019-02-11     15:26       15261624 20190211 NASA Finds Second Massive Greenland Crater (2AFq2rti3-4).mp4

Failed time set

Here's an example of the failure case: (LastWriteTime is not set to video's Published datetime).

PS> C:\Python\Python37\Scripts\youtube-dl --xattrs -o "%(upload_date)s %(title)s (%(id)s).%(ext)s" 2AFq2rti3-4
[youtube] 2AFq2rti3-4: Downloading webpage
[youtube] 2AFq2rti3-4: Downloading video info webpage
[download] Destination: 20190211 NASA Finds Second Massive Greenland Crater (2AFq2rti3-4).f137.mp4
[download] 100% of 13.69MiB in 00:01
[download] Destination: 20190211 NASA Finds Second Massive Greenland Crater (2AFq2rti3-4).f140.m4a
[download] 100% of 862.13KiB in 00:00
[ffmpeg] Merging formats into "20190211 NASA Finds Second Massive Greenland Crater (2AFq2rti3-4).mp4"
Deleting original file 20190211 NASA Finds Second Massive Greenland Crater (2AFq2rti3-4).f137.mp4 (pass -k to keep)
Deleting original file 20190211 NASA Finds Second Massive Greenland Crater (2AFq2rti3-4).f140.m4a (pass -k to keep)
[metadata] Writing metadata to file's xattrs

PS> ls

Mode                LastWriteTime         Length Name
----                -------------         ------ ----
-a----       2019-02-16     19:29       15261624 20190211 NASA Finds Second Massive Greenland Crater (2AFq2rti3-4).mp4


Versions and settings

PS> C:\Python\Python37\Scripts\youtube-dl -v
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['-v']
[debug] Encodings: locale cp1252, fs utf-8, out utf-8, pref cp1252
[debug] youtube-dl version 2019.02.08
[debug] Python version 3.7.1 (CPython) - Windows-10
[debug] exe versions: ffmpeg N-93129-g9e1e521393, ffprobe N-93129-g9e1e521393
[debug] Proxy map: {}

In my case, the powershell instance's $env:PATH variable includes the paths to ffmpeg.exe (version 20190215-9e1e521-win64-static) and AtomicParsley.exe (version win32-0.9.0).

diegocr commented 2 years ago

I find odd this issue have been open for years, it may passed unnoticed given i think it should be relatively easy to fix, perhaps @ivan could look into it since he made https://github.com/ytdl-org/youtube-dl/pull/4248

In the meantime, and due to my lack of Python experience, i just made this helper script:

#/bin/bash

vid=${!#}
data=$(youtube-dl --get-filename -g "$vid")

source=$(head -1 <<< "$data")
filename=$(tail -1 <<< "$data")
mtime=$(curl --head -sL "$source" | grep -Po 'Last-Modified:\K.*' | tail -1)

youtube-dl --embed-thumbnail --add-metadata ......

touch --date="$mtime" "$filename"

Hope it helps.

MrDOS commented 2 years ago

Hey, that's pretty cool. I didn't know about -g, --get-url. I think you can do without the --get-filename flag. Thanks for pointing this workaround out!

diegocr commented 2 years ago

I didn't knew about that as well, and initially made:

json=$(youtube-dl -s --print-json "$vid")
source=$(jq -cr ".requested_formats[].url" <<< "$json" | head -1)

Then realized i would also need the filename (for the touch command), and while checking the youtube-dl --help i found out about -g, --get-url :)

MrDOS commented 2 years ago

Oh, frig, I missed that you're using the last line of the output as well. Of course. Cool!