how to create wav.scp file

MohammdReza2020 commented 1 year ago

dears Thanks for organizing VCC2023 I just have a question from "egs/svcc23/fastsvc1/README.md" : in "### 2. Specify the following parameters for dataset preprocessing", you have talked about data/ directory. such directory is not found and it seams we ourselves have to create it. can you please explain about how to create data/ directory? Also, can you please how to create wav.scp files, I think you have presumed to have them, but such file is not found.

Yours Sincerely

lesterphillip commented 1 year ago

Thanks for your question! The wav.scp file is in the following format:

<utt_id> <path_to_file> (note that utt_id needs to be in a spkid_uttid format)

If you're using the SVCC23 dataset, I think you can use the following code. Then just refer to 2c here to split it into a train/test split.

import os

def make_wav_scp(directory, output_file, spk_ids=None):
    with open(output_file, "w") as f:
        for spk_id in os.listdir(directory):
            # just check if the directory exists
            spk_dir = os.path.join(directory, spk_id)
            if spk_ids is not None:
                assert os.path.isdir(spk_dir), f"speaker id in: {spk_dir} does not exist"

            for filename in os.listdir(spk_dir):
                if filename.split(".")[-1] == "wav":
                    utt_id = os.path.splitext(filename)[0]
                    file_path_dir = os.path.join(spk_dir, filename)
                    f.write(f"{spk_id}_{utt_id} {file_path_dir}\n")

make_wav_scp("/path/to/SVCC2023Dataset/Data", "wav.scp") # replace with the absolute path of your dataset

If you are using a different dataset with a different file structure, I think you need to modify the code a bit.

MohammdReza2020 commented 1 year ago

thank you so much for your reply and the piece of code you provided. it is OK.

lesterphillip commented 11 months ago

I've added this in a later commit.

lesterphillip / SVCC23_FastSVC

how to create wav.scp file #6