Hi, I ran the following testing code to convert .wav -> mel using librosa and then Univnet with pretrained checkpoint to do the inverse but the results were extremely bad. Can you point out what I'm doing wrong? The input file is clean, US english speech. arguments: -p ./chkpt/univ_c16_0292.pt -c config/default_c16.yaml -i /Users/kelseyd/Documents/train/TF -o ./out
for filename in tqdm.tqdm(glob.glob(os.path.join(args.input_folder, '*.wav'))):
y, sr = librosa.load(filename,sr=24000)
mel=librosa.feature.melspectrogram(y=y, sr=sr, n_fft=1024, n_mels=100, fmin=0, fmax=12000)
mel = torch.from_numpy(mel)
Hi, I ran the following testing code to convert .wav -> mel using librosa and then Univnet with pretrained checkpoint to do the inverse but the results were extremely bad. Can you point out what I'm doing wrong? The input file is clean, US english speech. arguments: -p ./chkpt/univ_c16_0292.pt -c config/default_c16.yaml -i /Users/kelseyd/Documents/train/TF -o ./out
for filename in tqdm.tqdm(glob.glob(os.path.join(args.input_folder, '*.wav'))): y, sr = librosa.load(filename,sr=24000) mel=librosa.feature.melspectrogram(y=y, sr=sr, n_fft=1024, n_mels=100, fmin=0, fmax=12000) mel = torch.from_numpy(mel)