turpaultn / DCASE2019_task4

Baseline of dcase 2019 task 4
59 stars 27 forks source link

missing files #2

Closed Kikyo-16 closed 5 years ago

Kikyo-16 commented 5 years ago

Hello, can you send me these missing files, please? thanks~ missing_files_unlabel_in_domain.zip

turpaultn commented 5 years ago

Send it by mail please: "nicolas.turpault@inria.fr". And don't you have files from validation and weak also ?

chumingqian commented 2 months ago

hello, @turpaultn :

I am using Ubuntu 20.04.

The youtube_dl version I am using is 2021.12.17.

The proxy I am using is Clash, which I start in the terminal with the command clash -d . I have set the mode to "global" in the web interface.

I modified the following function in desed/downloaded.py to add a proxy parameter and included my VPN settings in ydl_opts.

I tried using 'https://127.0.0.1:7890' and also 'socks5://127.0.0.1:7891', but neither was successful in downloading the dataset.

def _download_audioset_file(filename, result_dir, platform="youtube", tmp_folder="tmp", proxy="https://127.0.0.1:7890"):
    """ 
        proxy': 'https://127.0.0.1:7890',  # Replace with 'socks5://127.0.0.1:7891' if using SOCKS5
    """

    # Define download parameters
    ydl_opts = {
        "format": "bestaudio/best",
        "outtmpl": tmp_folder + "%(id)s.%(ext)s",
        "noplaylist": True,
        "quiet": True,
        "prefer_ffmpeg": True,
        "logger": LoggerYtdlWarnings(),
        "audioformat": "wav",
    }

    #  08-26 add
    if proxy:
        ydl_opts["proxy"] = proxy
        # print("add the  proxy item")
Once database is downloaded, do not forget to check your missing_files

 INFO - You can change N_JOBS and CHUNK_SIZE to increase the download with more processes.
 INFO - Validation data
100%|██████████| 1168/1168 [01:36<00:00, 12.16it/s]
 INFO - Train, weak data
100%|██████████| 1578/1578 [07:33<00:00,  3.48it/s]
 INFO - Train, unlabel in domain data
100%|██████████| 14412/14412 [18:13<00:00, 13.18it/s]
 INFO - ###### DONE #######