Closed Kikyo-16 closed 5 years ago
Send it by mail please: "nicolas.turpault@inria.fr". And don't you have files from validation and weak also ?
hello, @turpaultn :
I am using Ubuntu 20.04.
The youtube_dl
version I am using is 2021.12.17
.
The proxy I am using is Clash, which I start in the terminal with the command clash -d .
I have set the mode to "global" in the web interface.
I modified the following function in desed/downloaded.py
to add a proxy parameter and included my VPN settings in ydl_opts
.
I tried using 'https://127.0.0.1:7890'
and also 'socks5://127.0.0.1:7891'
, but neither was successful in downloading the dataset.
def _download_audioset_file(filename, result_dir, platform="youtube", tmp_folder="tmp", proxy="https://127.0.0.1:7890"):
"""
proxy': 'https://127.0.0.1:7890', # Replace with 'socks5://127.0.0.1:7891' if using SOCKS5
"""
# Define download parameters
ydl_opts = {
"format": "bestaudio/best",
"outtmpl": tmp_folder + "%(id)s.%(ext)s",
"noplaylist": True,
"quiet": True,
"prefer_ffmpeg": True,
"logger": LoggerYtdlWarnings(),
"audioformat": "wav",
}
# 08-26 add
if proxy:
ydl_opts["proxy"] = proxy
# print("add the proxy item")
Once database is downloaded, do not forget to check your missing_files
INFO - You can change N_JOBS and CHUNK_SIZE to increase the download with more processes.
INFO - Validation data
100%|██████████| 1168/1168 [01:36<00:00, 12.16it/s]
INFO - Train, weak data
100%|██████████| 1578/1578 [07:33<00:00, 3.48it/s]
INFO - Train, unlabel in domain data
100%|██████████| 14412/14412 [18:13<00:00, 13.18it/s]
INFO - ###### DONE #######
Hello, can you send me these missing files, please? thanks~ missing_files_unlabel_in_domain.zip