Closed MohammdReza2020 closed 1 year ago
Thanks for your question! The wav.scp file is in the following format:
<utt_id> <path_to_file>
(note that utt_id needs to be in a spkid_uttid
format)
If you're using the SVCC23 dataset, I think you can use the following code. Then just refer to 2c here to split it into a train/test split.
import os
def make_wav_scp(directory, output_file, spk_ids=None):
with open(output_file, "w") as f:
for spk_id in os.listdir(directory):
# just check if the directory exists
spk_dir = os.path.join(directory, spk_id)
if spk_ids is not None:
assert os.path.isdir(spk_dir), f"speaker id in: {spk_dir} does not exist"
for filename in os.listdir(spk_dir):
if filename.split(".")[-1] == "wav":
utt_id = os.path.splitext(filename)[0]
file_path_dir = os.path.join(spk_dir, filename)
f.write(f"{spk_id}_{utt_id} {file_path_dir}\n")
make_wav_scp("/path/to/SVCC2023Dataset/Data", "wav.scp") # replace with the absolute path of your dataset
If you are using a different dataset with a different file structure, I think you need to modify the code a bit.
thank you so much for your reply and the piece of code you provided. it is OK.
I've added this in a later commit.
dears Thanks for organizing VCC2023 I just have a question from "egs/svcc23/fastsvc1/README.md" : in "### 2. Specify the following parameters for dataset preprocessing", you have talked about
data/
directory. such directory is not found and it seams we ourselves have to create it. can you please explain about how to createdata/
directory? Also, can you please how to create wav.scp files, I think you have presumed to have them, but such file is not found.Yours Sincerely