ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
132.34k stars 10.03k forks source link

Optimize metadata support for OS X #8089

Open porg opened 8 years ago

porg commented 8 years ago

Metadata functions in youtube-dl

youtube-dl comes with these metadata related arguments: --add-metadata and --xattrs.

--add-metadata writes into the metadata section of the container format where possible/supported whereas --xattrs into the filesystem's metadata section.

Metadata on OS X

Mac users' default search feature Spotlight (and its background process mds, the metadata server) to my research seems to index only xattr variables from the namespace com.apple.* whereas youtube-dl writes into user.dublincore.* (contributor | date | description | format | title) and user.xdg.referrer.url.

Therefore the --xattrs is completely inaccessible to Spotlight, and --add-metadata only accessible if youtube-dl can 1) write into the container format and 2) Spotlight supports parsing the container's metadata and 3) Spotlight maps the variables correctly (to a certain degree).

Example: kMDItemDescription got filled by Spotlight for a 3gp file with --add-metadata but not for a mp4 file with --add-metadata.

Feature request

It would be fine if there would be an option --spotlight which would write the metadata into the appropriate OS X specific metadata locations.

Implementation hints

I'm no savvy developer, but would like to provide some resources at least:

1) Apple Developer Documentation on Spotlight Metadata Attributes, i.e.: kMDItemWhereFroms would be the ideal destination for the URL. kMDItemDescription would be the ideal destination for the description. …

2) Writing metadata seems to be done with: xattr -w com.apple.metadata:<properVariable> "<MetadataValueHere>" <file> But it seems to be a bit tricky, as some metadata exists in multiple places (legacy) and must thus be properly synchronized in order to show up in Spotlight as well as showing up in the Finder info windows, and it seems that the value cannot be written as plaintext, but must be encapsulated as PLIST (XML format for Apple property list) or binary PLIST.

porg commented 8 years ago

This AppleScript successfully copies the xattr user.dublincore.description to the Finder/Spotlight comment. Comment from xattr.txt

Saklad5 commented 8 years ago

This still hasn’t been changed. Can someone please get on this? It would make a world of difference when sorting through files.

yan12125 commented 8 years ago

Apparently mdls maps some Dublin Core attributes to com.apple.metadata ones, so not all attributes should be added. By the way, files merged with ffmpeg doesn't lose their extended attributes. Are things broken for you? @Saklad5 Here are the result from the latest youtube-dl commit + ffmpeg from Homebrew:

$ youtube-dl -v QG_YH7jxVy0 --xattr-set-filesize --xattrs --keep-video
[debug] System config: []
[debug] User config: []
[debug] Command-line args: [u'-v', u'QG_YH7jxVy0', u'--xattr-set-filesize', u'--xattrs', u'--keep-video']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2016.09.27
[debug] Git HEAD: 53a7e3d
[debug] Python version 2.6.9 - Darwin-16.0.0-x86_64-i386-64bit
[debug] exe versions: ffmpeg 3.1.3, ffprobe 3.1.3
[debug] Proxy map: {}
[youtube] QG_YH7jxVy0: Downloading webpage
[youtube] QG_YH7jxVy0: Downloading video info webpage
[youtube] QG_YH7jxVy0: Extracting video information
[youtube] QG_YH7jxVy0: Downloading MPD manifest
WARNING: Requested formats are incompatible for merge and will be merged into mkv.
[debug] Invoking downloader on u'https://r8---sn-5njj-u2xl.googlevideo.com/videoplayback?id=406fd81fb8f1572d&itag=135&source=youtube&requiressl=yes&pl=17&mv=m&ms=au&mm=31&mn=sn-5njj-u2xl&gcr=tw&initcwndbps=6998750&ratebypass=yes&mime=video/mp4&gir=yes&clen=11720569&lmt=1385507366042723&dur=246.012&mt=1475325896&upn=afoBHizrrSM&key=dg_yt0&signature=955F701365E5EDE59E6728290D7CFCCACA2B8E07.88C8F5790456CA6ECBC9F53821DA535AC4E94D97&ip=140.112.230.216&ipbits=0&expire=1475348342&sparams=ip,ipbits,expire,id,itag,source,requiressl,pl,mv,ms,mm,mn,gcr,initcwndbps,ratebypass,mime,gir,clen,lmt,dur'
[download] Destination: 貓的報恩-幻化成風-QG_YH7jxVy0.f135.mp4
[download] 100% of 11.18MiB in 00:01
[debug] Invoking downloader on u'https://r8---sn-5njj-u2xl.googlevideo.com/videoplayback?id=406fd81fb8f1572d&itag=251&source=youtube&requiressl=yes&pl=17&mv=m&ms=au&mm=31&mn=sn-5njj-u2xl&gcr=tw&initcwndbps=6998750&ratebypass=yes&mime=audio/webm&gir=yes&clen=4313393&lmt=1449568659391520&dur=246.041&mt=1475325896&upn=afoBHizrrSM&key=dg_yt0&signature=4BF92C33C5DBB505F95138D29C653F3BAC994335.878CE0E0ADFDE6C81121C54E8E265B4ADF57ED6E&ip=140.112.230.216&ipbits=0&expire=1475348342&sparams=ip,ipbits,expire,id,itag,source,requiressl,pl,mv,ms,mm,mn,gcr,initcwndbps,ratebypass,mime,gir,clen,lmt,dur'
[download] Destination: 貓的報恩-幻化成風-QG_YH7jxVy0.f251.webm
[download] 100% of 4.11MiB in 00:00
[ffmpeg] Merging formats into "貓的報恩-幻化成風-QG_YH7jxVy0.mkv"
[debug] ffmpeg command line: ffmpeg -y -i 'file:貓的報恩-幻化成風-QG_YH7jxVy0.f135.mp4' -i 'file:貓的報恩-幻化成風-QG_YH7jxVy0.f251.webm' -c copy -map 0:v:0 -map 1:a:0 'file:貓的報恩-幻化成風-QG_YH7jxVy0.temp.mkv'
[metadata] Writing metadata to file's xattrs

$ xattr -l 貓的報恩-幻化成風-QG_YH7jxVy0.mkv 
user.dublincore.contributor: rjmhwy
user.dublincore.date: 2013-01-28
user.dublincore.description: 幻化成風(烏克麗麗不插電演奏版/動畫《貓的報恩》採用版本)

作詞.作曲:過亞彌乃
編曲:根岸孝旨
user.dublincore.format: 135 - 854x480 (DASH video)+251 - audio only (DASH audio)
user.dublincore.title: 貓的報恩-幻化成風
user.xdg.referrer.url: https://www.youtube.com/watch?v=QG_YH7jxVy0

$ mdls 貓的報恩-幻化成風-QG_YH7jxVy0.mkv  
kMDItemFSContentChangeDate = 2013-11-26 23:09:26 +0000
kMDItemFSCreationDate      = 2013-11-26 23:09:26 +0000
kMDItemFSCreatorCode       = ""
kMDItemFSFinderFlags       = 0
kMDItemFSHasCustomIcon     = 0
kMDItemFSInvisible         = 0
kMDItemFSIsExtensionHidden = 0
kMDItemFSIsStationery      = 0
kMDItemFSLabel             = 0
kMDItemFSName              = "貓的報恩-幻化成風-QG_YH7jxVy0.mkv"
kMDItemFSNodeCount         = 15988466
kMDItemFSOwnerGroupID      = 20
kMDItemFSOwnerUserID       = 501
kMDItemFSSize              = 15988466
kMDItemFSTypeCode          = ""
Saklad5 commented 8 years ago

I finally figured it out. All of the metadata appears to be dropped when I use MP4, but when outputting to Matroska the metadata is preserved just fine.

yan12125 commented 8 years ago

Extended attributes should be independent of file types. @Saklad5 could you explain more about "use MP4"? For example the command you use and the environment (OS version, ffmpeg/avconv version, etc.)

Saklad5 commented 8 years ago

output.txt Hopefully that is a sufficient amount of information. I know the extended attributes are format-agnostic, which is why this behavior is so puzzling.

yan12125 commented 8 years ago

@Saklad5 The problem is that atomicparsley does not copy extended attributes. Fixed in b19e275.

porg commented 8 years ago

I'm glad that my feature request got attention after some time. In my next spare time I'm going to test the implementation and will give some feedback probably.

yan12125 commented 8 years ago

By the way, there's a request for kMDItemWhereFroms at #2545 and an implementation at https://github.com/tewe/youtube-dl/commit/7f5c82729ba762344d4cd311920196cacc80c3dc. That approach looks roughly good. My question is: how to check whether kMDItemWhereFroms is corrected set or not? Not familiar with Spotlight and Finder.

porg commented 8 years ago

mdls -name kMDItemWhereFroms /path/to/file will return (null) if undefined or an URL string like http://domain.com/origin/of/file if the downloading application set it.

Saklad5 commented 8 years ago

Any progress on this issue?

yan12125 commented 8 years ago

I downloaded http://yt-dl.org/downloads/latest/youtube-dl with Safari. There are some extended attributes associated with the downloaded file:

$ xattr -l ~/Downloads/youtube-dl                    
com.apple.metadata:kMDItemDownloadedDate:
00000000  62 70 6C 69 73 74 30 30 A1 01 33 41 BD C6 33 49  |bplist00..3A..3I|
00000010  0B AC A3 08 0A 00 00 00 00 00 00 01 01 00 00 00  |................|
00000020  00 00 00 00 02 00 00 00 00 00 00 00 00 00 00 00  |................|
00000030  00 00 00 00 13                                   |.....|
00000035
com.apple.metadata:kMDItemWhereFroms:
00000000  62 70 6C 69 73 74 30 30 A2 01 02 5F 11 01 E4 68  |bplist00..._...h|
00000010  74 74 70 73 3A 2F 2F 67 69 74 68 75 62 2D 63 6C  |ttps://github-cl|
00000020  6F 75 64 2E 73 33 2E 61 6D 61 7A 6F 6E 61 77 73  |oud.s3.amazonaws|
00000030  2E 63 6F 6D 2F 72 65 6C 65 61 73 65 73 2F 31 30  |.com/releases/10|
00000040  33 39 35 32 30 2F 35 62 32 38 63 37 33 34 2D 39  |39520/5b28c734-9|
00000050  62 62 36 2D 31 31 65 36 2D 39 30 34 37 2D 61 66  |bb6-11e6-9047-af|
00000060  31 64 39 62 31 32 32 32 64 34 3F 58 2D 41 6D 7A  |1d9b1222d4?X-Amz|
00000070  2D 41 6C 67 6F 72 69 74 68 6D 3D 41 57 53 34 2D  |-Algorithm=AWS4-|
00000080  48 4D 41 43 2D 53 48 41 32 35 36 26 58 2D 41 6D  |HMAC-SHA256&X-Am|
00000090  7A 2D 43 72 65 64 65 6E 74 69 61 6C 3D 41 4B 49  |z-Credential=AKI|
000000A0  41 49 53 54 4E 5A 46 4F 56 42 49 4A 4D 4B 33 54  |AISTNZFOVBIJMK3T|
000000B0  51 25 32 46 32 30 31 36 31 30 33 30 25 32 46 75  |Q%2F20161030%2Fu|
000000C0  73 2D 65 61 73 74 2D 31 25 32 46 73 33 25 32 46  |s-east-1%2Fs3%2F|
000000D0  61 77 73 34 5F 72 65 71 75 65 73 74 26 58 2D 41  |aws4_request&X-A|
000000E0  6D 7A 2D 44 61 74 65 3D 32 30 31 36 31 30 33 30  |mz-Date=20161030|
000000F0  54 31 33 35 35 31 37 5A 26 58 2D 41 6D 7A 2D 45  |T135517Z&X-Amz-E|
00000100  78 70 69 72 65 73 3D 33 30 30 26 58 2D 41 6D 7A  |xpires=300&X-Amz|
00000110  2D 53 69 67 6E 61 74 75 72 65 3D 37 37 32 63 30  |-Signature=772c0|
00000120  63 38 65 62 64 34 39 33 66 65 37 62 34 39 30 39  |c8ebd493fe7b4909|
00000130  61 63 65 39 30 32 35 34 37 34 66 31 62 65 63 34  |ace9025474f1bec4|
00000140  64 66 36 63 35 38 33 38 63 31 66 37 34 66 31 31  |df6c5838c1f74f11|
00000150  33 30 37 62 61 34 62 63 62 32 36 26 58 2D 41 6D  |307ba4bcb26&X-Am|
00000160  7A 2D 53 69 67 6E 65 64 48 65 61 64 65 72 73 3D  |z-SignedHeaders=|
00000170  68 6F 73 74 26 61 63 74 6F 72 5F 69 64 3D 30 26  |host&actor_id=0&|
00000180  72 65 73 70 6F 6E 73 65 2D 63 6F 6E 74 65 6E 74  |response-content|
00000190  2D 64 69 73 70 6F 73 69 74 69 6F 6E 3D 61 74 74  |-disposition=att|
000001A0  61 63 68 6D 65 6E 74 25 33 42 25 32 30 66 69 6C  |achment%3B%20fil|
000001B0  65 6E 61 6D 65 25 33 44 79 6F 75 74 75 62 65 2D  |ename%3Dyoutube-|
000001C0  64 6C 26 72 65 73 70 6F 6E 73 65 2D 63 6F 6E 74  |dl&response-cont|
000001D0  65 6E 74 2D 74 79 70 65 3D 61 70 70 6C 69 63 61  |ent-type=applica|
000001E0  74 69 6F 6E 25 32 46 6F 63 74 65 74 2D 73 74 72  |tion%2Foctet-str|
000001F0  65 61 6D 5F 10 11 68 74 74 70 3A 2F 2F 79 74 2D  |eam_..http://yt-|
00000200  64 6C 2E 6F 72 67 2F 00 08 00 0B 01 F3 00 00 00  |dl.org/.........|
00000210  00 00 00 02 01 00 00 00 00 00 00 00 03 00 00 00  |................|
00000220  00 00 00 00 00 00 00 00 00 00 00 02 07           |.............|
0000022d
com.apple.quarantine: 0083;5815fbc9;Safari;6DF70E85-6756-4590-94DF-8DB4D56EE703

However, mdls can't read it:

$ mdls -name kMDItemWhereFroms ~/Downloads/youtube-dl
kMDItemWhereFroms = (null)

Using Sierra 10.12.1. Is there other way to check whether com.apple.metadata:kMDItemWhereFroms is correctly set or not?

MaddTheSane commented 5 years ago

mdls worked fine for me. Granted, this was on Mojave 10.14.6.

brunocek commented 3 years ago

Besides "where from" information, can we please have also the description of the video that comes with all Youtube videos, please?

Foxtrod89 commented 3 months ago

Apparently mdls maps some Dublin Core attributes to com.apple.metadata ones, so not all attributes should be added. By the way, files merged with ffmpeg doesn't lose their extended attributes. Are things broken for you? @Saklad5 Here are the result from the latest youtube-dl commit + ffmpeg from Homebrew:

$ youtube-dl -v QG_YH7jxVy0 --xattr-set-filesize --xattrs --keep-video
[debug] System config: []
[debug] User config: []
[debug] Command-line args: [u'-v', u'QG_YH7jxVy0', u'--xattr-set-filesize', u'--xattrs', u'--keep-video']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2016.09.27
[debug] Git HEAD: 53a7e3d
[debug] Python version 2.6.9 - Darwin-16.0.0-x86_64-i386-64bit
[debug] exe versions: ffmpeg 3.1.3, ffprobe 3.1.3
[debug] Proxy map: {}
[youtube] QG_YH7jxVy0: Downloading webpage
[youtube] QG_YH7jxVy0: Downloading video info webpage
[youtube] QG_YH7jxVy0: Extracting video information
[youtube] QG_YH7jxVy0: Downloading MPD manifest
WARNING: Requested formats are incompatible for merge and will be merged into mkv.
[debug] Invoking downloader on u'https://r8---sn-5njj-u2xl.googlevideo.com/videoplayback?id=406fd81fb8f1572d&itag=135&source=youtube&requiressl=yes&pl=17&mv=m&ms=au&mm=31&mn=sn-5njj-u2xl&gcr=tw&initcwndbps=6998750&ratebypass=yes&mime=video/mp4&gir=yes&clen=11720569&lmt=1385507366042723&dur=246.012&mt=1475325896&upn=afoBHizrrSM&key=dg_yt0&signature=955F701365E5EDE59E6728290D7CFCCACA2B8E07.88C8F5790456CA6ECBC9F53821DA535AC4E94D97&ip=140.112.230.216&ipbits=0&expire=1475348342&sparams=ip,ipbits,expire,id,itag,source,requiressl,pl,mv,ms,mm,mn,gcr,initcwndbps,ratebypass,mime,gir,clen,lmt,dur'
[download] Destination: 貓的報恩-幻化成風-QG_YH7jxVy0.f135.mp4
[download] 100% of 11.18MiB in 00:01
[debug] Invoking downloader on u'https://r8---sn-5njj-u2xl.googlevideo.com/videoplayback?id=406fd81fb8f1572d&itag=251&source=youtube&requiressl=yes&pl=17&mv=m&ms=au&mm=31&mn=sn-5njj-u2xl&gcr=tw&initcwndbps=6998750&ratebypass=yes&mime=audio/webm&gir=yes&clen=4313393&lmt=1449568659391520&dur=246.041&mt=1475325896&upn=afoBHizrrSM&key=dg_yt0&signature=4BF92C33C5DBB505F95138D29C653F3BAC994335.878CE0E0ADFDE6C81121C54E8E265B4ADF57ED6E&ip=140.112.230.216&ipbits=0&expire=1475348342&sparams=ip,ipbits,expire,id,itag,source,requiressl,pl,mv,ms,mm,mn,gcr,initcwndbps,ratebypass,mime,gir,clen,lmt,dur'
[download] Destination: 貓的報恩-幻化成風-QG_YH7jxVy0.f251.webm
[download] 100% of 4.11MiB in 00:00
[ffmpeg] Merging formats into "貓的報恩-幻化成風-QG_YH7jxVy0.mkv"
[debug] ffmpeg command line: ffmpeg -y -i 'file:貓的報恩-幻化成風-QG_YH7jxVy0.f135.mp4' -i 'file:貓的報恩-幻化成風-QG_YH7jxVy0.f251.webm' -c copy -map 0:v:0 -map 1:a:0 'file:貓的報恩-幻化成風-QG_YH7jxVy0.temp.mkv'
[metadata] Writing metadata to file's xattrs

$ xattr -l 貓的報恩-幻化成風-QG_YH7jxVy0.mkv 
user.dublincore.contributor: rjmhwy
user.dublincore.date: 2013-01-28
user.dublincore.description: 幻化成風(烏克麗麗不插電演奏版/動畫《貓的報恩》採用版本)

作詞.作曲:過亞彌乃
編曲:根岸孝旨
user.dublincore.format: 135 - 854x480 (DASH video)+251 - audio only (DASH audio)
user.dublincore.title: 貓的報恩-幻化成風
user.xdg.referrer.url: https://www.youtube.com/watch?v=QG_YH7jxVy0

$ mdls 貓的報恩-幻化成風-QG_YH7jxVy0.mkv  
kMDItemFSContentChangeDate = 2013-11-26 23:09:26 +0000
kMDItemFSCreationDate      = 2013-11-26 23:09:26 +0000
kMDItemFSCreatorCode       = ""
kMDItemFSFinderFlags       = 0
kMDItemFSHasCustomIcon     = 0
kMDItemFSInvisible         = 0
kMDItemFSIsExtensionHidden = 0
kMDItemFSIsStationery      = 0
kMDItemFSLabel             = 0
kMDItemFSName              = "貓的報恩-幻化成風-QG_YH7jxVy0.mkv"
kMDItemFSNodeCount         = 15988466
kMDItemFSOwnerGroupID      = 20
kMDItemFSOwnerUserID       = 501
kMDItemFSSize              = 15988466
kMDItemFSTypeCode          = ""

apparently it doesn't, kMDItemFSName fetched from filename