Open libeanim opened 8 months ago
Are you getting empty yt_meta_dict for just some videos or all of them?
What I am is seeing, that for every 300 videos I seem to get roughly 100 videos with yt_meta_dict populated and 200 videos with yt_meta_dict = {}, which is quite strange.
What exactly does ignoring errors in yt_dlp mean? Even if you have retries, it gives up on the first try?
Other yt_dlp codepaths don't seem to set this.
Ahh! Now I understand what happens: with multiple clips, only the first one (_00000.json) will have yt_meta_dict populated, not the following clips.
It seems this was a change introduced by clipping subsampler refactoring (#275), did it behave differently in v1.2.0?
I am not sure if this is a good idea. Depending on your processing pipeline, you might want to have the same metadata available on all the clips.
I agree duplicating the metadata makes more sense especially given the size of the data
On Thu, Mar 7, 2024, 12:33 PM Henrik Ahlgren @.***> wrote:
Ahh! Now I understand what happens: with multiple clips, only the first one (_00000.json) will have yt_meta_dict populated, not the following clips.
It seems this was a change introduced by clipping subsampler refactoring (
275 https://github.com/iejMac/video2dataset/pull/275), did it behave
differently in v1.2.0?
I am not sure if this is a good idea. Depending on your processing pipeline, you might want to have the same metadata available on all the clips.
— Reply to this email directly, view it on GitHub https://github.com/iejMac/video2dataset/issues/319#issuecomment-1983319110, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR437QGWBOFA5L5DCYENX3YXBGB7AVCNFSM6AAAAABDRVKHRCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBTGMYTSMJRGA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Issue
When using video2dataset (1.3.0) to download youtube videos i've set the following entry in the config to retrieve meta data:
But in the resulting json files the entry
"yt_meta_dict": {}
, is empty even thoughget_info: True
in the config.How to reproduce
For example this link: https://www.youtube.com/embed/JFUsP1coIKM When i download that with yt-dlp:
I get youtube meta data like
"categories": ["Entertainment"], "tags": ["Deutsche", "Welle", "Made", "in", "Germany", "Bio", "Lettland", "Getreide"]
But with video2dataset it looks like this: