Closed rishabh135 closed 6 years ago
Thank you for your report! It seems you are trying to load wav files of VCTK, right? If so, you need to use WavFileDataSource
instead of TranscriptionDataSource
. i.e:
X = FileSourceDataset(vctk.TranscriptionDataSource(data_root))
for x in X:
// do something here
instead,
X = FileSourceDataset(vctk.WavFileDataSource(data_root))
for x in X:
// do something here
Thanks for that , but even with vctk.WavFileDataSource, I face the following issue , when I try loading the wavfile as a data source :
Convert datasets to arrays
80%|██████████████████████████████████████████▏ | 291/366 [20:07<05:11, 4.15s/it]
Killed
I couldn't find anything why it kills the process in between without any error logging and neither could I find anything in the docs , please kindly share what's the reason behind this sudden exit .
According to https://stackoverflow.com/a/31815142,
The most likely reason is probably that your process crossed some limit in the amount of system resources that you are allowed to use. Depending on your OS and configuration, this could mean you had too many open files, used too much filesytem space or something else. The most likely is that your program was using too much memory. Rather than risking things breaking when memory allocations started failing, the system sent a kill signal to the process that was using too much memory.
I suspect you hit out-of-memory error. Can you check memory usage?
Also please share code so that I can reproduce.
Hello @r9y9 , I am primarily using your script , here it is and no I don't believe that the memory is the issue as I have well over 8 CPU cores free at any point and :
"""Prepare acoustic features for one-to-one voice conversion.
usage:
prepare_features_vc.py [options] <DATA_ROOT> <source_speaker> <target_speaker>
options:
--max_files=<N> Max num files to be collected. [default: 100]
--dst_dir=<d> Destination directory [default: data/cmu_arctic_vc].
--overwrite Overwrite files.
-h, --help show this help message and exit
"""
from __future__ import division, print_function, absolute_import
from docopt import docopt
import numpy as np
from nnmnkwii.datasets import FileSourceDataset
from nnmnkwii import preprocessing as P
from nnmnkwii.preprocessing.alignment import DTWAligner
from nnmnkwii.datasets import cmu_arctic, voice_statistics, vcc2016
from nnmnkwii.datasets import vctk
import pysptk
import pyworld
from scipy.io import wavfile
from tqdm import tqdm
from os.path import basename, splitext, exists, expanduser, join, dirname
import os
import sys
#from hparams import vc as hp
#from hparams import hparams_debug_string
frame_period=5
order=59
windows=[ (0, 0, np.array([1.0])),
(1, 1, np.array([-0.5, 0.0, 0.5])),
(1, 1, np.array([1.0, -2.0, 1.0])),
]
# vcc2016.WavFileDataSource and voice_statistics.WavFileDataSource can be
# drop-in replacement. See below for details:
# https://r9y9.github.io/nnmnkwii/latest/references/datasets.html#builtin-data-sources
class MGCSource(vctk.WavFileDataSource):
def __init__(self, data_root, speakers, max_files=None):
super(MGCSource, self).__init__(data_root, speakers)
self.alpha = None
def collect_features(self, wav_path):
fs, x = wavfile.read(wav_path)
x = x.astype(np.float64)
f0, timeaxis = pyworld.dio(x, fs, frame_period = frame_period)
f0 = pyworld.stonemask(x, f0, timeaxis, fs)
spectrogram = pyworld.cheaptrick(x, f0, timeaxis, fs)
spectrogram = P.trim_zeros_frames(spectrogram)
if self.alpha is None:
self.alpha = pysptk.util.mcepalpha(fs)
mgc = pysptk.sp2mc(spectrogram, order = order, alpha=self.alpha)
# Drop 0-th coefficient
mgc = mgc[:, 1:]
# 50Hz cut-off MS smoothing
hop_length = int(fs * (frame_period * 0.001))
modfs = fs / hop_length
mgc = P.modspec_smoothing(mgc, modfs, cutoff=50)
# Add delta
mgc = P.delta_features(mgc, windows)
return mgc.astype(np.float32)
if __name__ == "__main__":
#args = docopt(__doc__)
#print("Command line args:\n", args)
#DATA_ROOT = args["<DATA_ROOT>"]
# source_speaker = args["<source_speaker>"]
# target_speaker = args["<target_speaker>"]
# max_files = int(args["--max_files"])
# max_files = int(args["--max_files"])
# dst_dir = args["--dst_dir"]
# overwrite = args["--overwrite"]
DATA_ROOT = "/root/data/VCTK-Corpus/"
source_speaker = ['228']
target_speaker = ['229']
max_files = 100
dst_dir = os.path.join(os.getcwd() , "mgc_files/")
if not os.path.exists(dst_dir):
os.makedirs(dst_dir)
overwrite = True
#print(hparams_debug_string(hp))
X_dataset = FileSourceDataset(MGCSource(DATA_ROOT, speakers=source_speaker ) )
Y_dataset = FileSourceDataset(MGCSource(DATA_ROOT, speakers = target_speaker))
skip_feature_extraction = exists(join(dst_dir, "X")) \
and exists(join(dst_dir, "Y"))
if overwrite:
skip_feature_extraction = False
if skip_feature_extraction:
print("Features seems to be prepared, skipping feature extraction.")
sys.exit(0)
# Create dirs
for speaker, name in [(source_speaker, "X"), (target_speaker, "Y")]:
d = join(dst_dir, name)
print("Destination dir for {}: {}".format(speaker, d))
if not exists(d):
os.makedirs(d)
# Convert to arrays
#print(X_dataset.shape)
#sys.exit(0)
print("Convert datasets to arrays")
X = X_dataset.asarray(verbose=1) . ## The error happens here
Y = Y_dataset.asarray(verbose=1)
# Alignment
print("Perform alignment")
X, Y = DTWAligner().transform((X, Y))
print("Save features to disk")
for idx, (x, y) in tqdm(enumerate(zip(X, Y))):
# paths
src_name = splitext(basename(X_dataset.collected_files[idx][0]))[0]
tgt_name = splitext(basename(Y_dataset.collected_files[idx][0]))[0]
src_path = join(dst_dir, "X", src_name)
tgt_path = join(dst_dir, "Y", tgt_name)
# Trim and ajast frames
x = P.trim_zeros_frames(x)
y = P.trim_zeros_frames(y)
x, y = P.adjast_frame_lengths(x, y, pad=True, divisible_by=2)
# Save
np.save(src_path, x)
print("creating file " + str(src_path))
#sys.exit()
np.save(tgt_path, y)
sys.exit(0)
Thank you for the code. I found a bug that can cause large memory re-allocations. Fix coming.
@rishabh135 Could you check that it works with new version? The following command should install latest version of nnmnkwii:
pip install --upgrade git+https://github.com/r9y9/nnmnkwii
Yeah , it works now or at least doesn't exit in between . Thanks 👍 Though I am wondering if you can discuss the usage of MGCs (I think which are same as MCEPs) vs MFCCs (which seem to be a little more popular ) ?
Glad to hear that fixes your issue! I will tag a new release soon.
As far as I know, MFCC is a compressed representation of mel-frequency spectra. It has been used in speech recognition. On the other hand, Mel-cepstrum is not just a compressed representation, but also is carefully designed to be a filter parameter for speech synthesis. It has been used in mainly for voice conversion and speech synthesis.
Tagged a new release https://github.com/r9y9/nnmnkwii/releases/tag/v0.0.10.
I will close this issue since the bug was fixed. Feel free to open new issues if you have any other problem.
Hello @r9y9 brilliant work , but while loading the data loader for VCTK dataset with transcriptions which are in file with structure
I get the following error :