thuhcsi / S2G-MDDiffusion

MIT License
59 stars 2 forks source link

Data download and process #2

Closed GruelMi closed 4 months ago

GruelMi commented 4 months ago

Thank you for your awesome work.

I tried using your script to download data, but many of the data indicated that the source could not be found. I tried to download the data from PATS again, but after decompression, it was all in H5 format. Can you tell me how to obtain the corresponding video and audio files? Or how to extract corresponding videos and audio from these H5 files?

hjrPhoebus commented 4 months ago

hii @GruelMi First of all, the h5 file provided by PATS only contains pose information, while the raw audio and video can only be downloaded via the url in the meta file cmu_intervals_df.csv in data-preparation folder.

Actually, the full meta file cmu_intervals_df.csv provided by PATS contains many videos that can't be downloaded due to permissions and other issues, and I've removed this part of the url when I preprocessed the data. At least when I processed the data the url's corresponding to the interval id's in filtered_intervals.json were available for download. Maybe it's because some of the url's permissions have changed again after a while.

Can you provide me with some url that is not downloadable, and I'll try to see if it's a problem with the url itself? If there are still too many videos that can't be downloaded, please contact hex22@mails.tsinghua.edu.cn.

GruelMi commented 4 months ago

Thanks for the detail response! As discussed in email, the key is to download the tool: yt-dlp.