Open kdu4108 opened 1 month ago
@garjania What other metadata would you recommend being in here for MVP?
Also what other metadata from the youtube metadata of v2d do you think might be useful to include?
"yt_meta_dict": {
"info": {
"id": "QW3-5OuWn4M",
"title": "IBM SPSS",
"thumbnail": "https://i.ytimg.com/vi/QW3-5OuWn4M/maxresdefault.jpg",
"description": "For the past five years, King Fish has been creating a media channel for IBM to generate leads of senior IT decision makers and retain current customers. We produce dozens of webcasts every year for numerous divisions within IBM. King Fish provides managed services, original content and audience development. \n\nKFM worked with IBM to develop video content on how SPSS Statistics can help their clients meet business goals with advanced data insight methods. The result? Much more effective than an info-graphic.",
"uploader": "King Fish Media",
"uploader_id": "KingFishMediaBoston",
"uploader_url": "http://www.youtube.com/user/KingFishMediaBoston",
"channel_id": "UCDy7Xb5vYxbmSosQmztCCcQ",
"channel_url": "https://www.youtube.com/channel/UCDy7Xb5vYxbmSosQmztCCcQ",
"duration": 122,
"view_count": 116,
"average_rating": null,
"age_limit": 0,
"webpage_url": "https://www.youtube.com/watch?v=QW3-5OuWn4M",
"categories": [
"Science & Technology"
],
"tags": [
"IBM",
"technology",
"statistics",
"data",
"analysis",
"computers",
"content marketing",
"Software"
],
"playable_in_embed": true,
"live_status": "not_live",
"release_timestamp": null,
"comment_count": null,
"chapters": null,
"like_count": 1,
"channel": "King Fish Media",
"channel_follower_count": 10,
"upload_date": "20131107",
"availability": "public",
"original_url": "http://youtube.com/watch?v=QW3-5OuWn4M",
"webpage_url_basename": "watch",
"webpage_url_domain": "youtube.com",
"extractor": "youtube",
"extractor_key": "Youtube",
"playlist": null,
"playlist_index": null,
"display_id": "QW3-5OuWn4M",
"fulltitle": "IBM SPSS",
"duration_string": "2:02",
"is_live": false,
"was_live": false,
"requested_subtitles": {
"en": {
"ext": "vtt",
"url": "https://www.youtube.com/api/timedtext?v=QW3-5OuWn4M&caps=asr&xoaf=5&hl=en&ip=0.0.0.0&ipbits=0&expire=1676200746&sparams=ip%2Cipbits%2Cexpire%2Cv%2Ccaps%2Cxoaf&signature=A43F4C223A9DBC7E3BFBC61027FC5AF70D709AB5.B386EB52DD412DEFC3E8DBBCF7F30C442473CDA4&key=yt8&kind=asr&lang=en&fmt=vtt",
"name": "English"
}
},
"_has_drm": null,
"format": "137 - 1920x1080 (1080p)+251 - audio only (medium)",
"format_id": "137+251",
"ext": "mkv",
"protocol": "https+https",
"language": null,
"format_note": "1080p+medium",
"filesize_approx": 12831366,
"tbr": 841.009,
"width": 1920,
"height": 1080,
"resolution": "1920x1080",
"fps": 30,
"dynamic_range": "SDR",
"vcodec": "avc1.640028",
"vbr": 691.069,
"stretched_ratio": null,
"acodec": "opus",
"abr": 149.94,
"asr": 48000,
"audio_channels": 2
}
},
``` (from https://github.com/iejMac/video2dataset/blob/main/examples/yt_metadata.md)
maybe we can also have something like tags? do we have access to it for the YouTube dataset?
besides fps and resolution, the other ones don't seem to provide useful information.
Goal: given v2d format of
produce a
metadata/
modality data folder of the following format:Each json should look something like
(exact format/required keys TBD, since there probably should be more than just video metadata here? like maybe caption quality or something would be nice? 1st person vs 3rd person? what else?)