r9y9 / nnmnkwii

Library to build speech synthesis systems designed for easy and fast prototyping.
https://r9y9.github.io/nnmnkwii/latest/
Other
392 stars 74 forks source link

Problem with loading datasource for VCTK dataset . #59

Closed rishabh135 closed 6 years ago

rishabh135 commented 6 years ago

Hello @r9y9 brilliant work , but while loading the data loader for VCTK dataset with transcriptions which are in file with structure

VCTK-Corpus/
    COPYING
    README
    speaker-info.txt
    txt/
        p225/
            p225_001.txt

I get the following error :

 File "prepare_accoustic_features.py", line 128, in <module>
    X = X_dataset.asarray(verbose=1)
  File "/root/anaconda3/envs/pyt/lib/python3.6/site-packages/nnmnkwii/datasets/__init__.py", line 153, in asarray
    D = self[0].shape[-1]
  File "/root/anaconda3/envs/pyt/lib/python3.6/site-packages/nnmnkwii/datasets/__init__.py", line 126, in __getitem__
    *self.collected_files[idx])
  File "prepare_accoustic_features.py", line 56, in collect_features
    fs, x = wavfile.read(wav_path)
  File "/root/anaconda3/envs/pyt/lib/python3.6/site-packages/scipy/io/wavfile.py", line 233, in read
    fid = open(filename, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: 'Please call Stella.'
r9y9 commented 6 years ago

Thank you for your report! It seems you are trying to load wav files of VCTK, right? If so, you need to use WavFileDataSource instead of TranscriptionDataSource. i.e:

X = FileSourceDataset(vctk.TranscriptionDataSource(data_root))

for x in X:
  // do something here

instead,

X = FileSourceDataset(vctk.WavFileDataSource(data_root))

for x in X:
  // do something here
rishabh135 commented 6 years ago

Thanks for that , but even with vctk.WavFileDataSource, I face the following issue , when I try loading the wavfile as a data source :

Convert datasets to arrays
 80%|██████████████████████████████████████████▏          | 291/366 [20:07<05:11,  4.15s/it]
Killed

I couldn't find anything why it kills the process in between without any error logging and neither could I find anything in the docs , please kindly share what's the reason behind this sudden exit .

r9y9 commented 6 years ago

According to https://stackoverflow.com/a/31815142,

The most likely reason is probably that your process crossed some limit in the amount of system resources that you are allowed to use. Depending on your OS and configuration, this could mean you had too many open files, used too much filesytem space or something else. The most likely is that your program was using too much memory. Rather than risking things breaking when memory allocations started failing, the system sent a kill signal to the process that was using too much memory.

I suspect you hit out-of-memory error. Can you check memory usage?

r9y9 commented 6 years ago

Also please share code so that I can reproduce.

rishabh135 commented 6 years ago

Hello @r9y9 , I am primarily using your script , here it is and no I don't believe that the memory is the issue as I have well over 8 CPU cores free at any point and :


"""Prepare acoustic features for one-to-one voice conversion.
usage:
    prepare_features_vc.py [options] <DATA_ROOT> <source_speaker> <target_speaker>
options:
    --max_files=<N>      Max num files to be collected. [default: 100]
    --dst_dir=<d>        Destination directory [default: data/cmu_arctic_vc].
    --overwrite          Overwrite files.
    -h, --help           show this help message and exit
"""
from __future__ import division, print_function, absolute_import

from docopt import docopt
import numpy as np

from nnmnkwii.datasets import FileSourceDataset
from nnmnkwii import preprocessing as P
from nnmnkwii.preprocessing.alignment import DTWAligner
from nnmnkwii.datasets import cmu_arctic, voice_statistics, vcc2016

from nnmnkwii.datasets import vctk

import pysptk
import pyworld
from scipy.io import wavfile
from tqdm import tqdm
from os.path import basename, splitext, exists, expanduser, join, dirname
import os
import sys

#from hparams import vc as hp
#from hparams import hparams_debug_string

frame_period=5
order=59
windows=[   (0, 0, np.array([1.0])),
            (1, 1, np.array([-0.5, 0.0, 0.5])),
            (1, 1, np.array([1.0, -2.0, 1.0])),
        ]

# vcc2016.WavFileDataSource and voice_statistics.WavFileDataSource can be
# drop-in replacement. See below for details:
# https://r9y9.github.io/nnmnkwii/latest/references/datasets.html#builtin-data-sources
class MGCSource(vctk.WavFileDataSource):
    def __init__(self, data_root, speakers, max_files=None):
        super(MGCSource, self).__init__(data_root, speakers)
        self.alpha = None

    def collect_features(self, wav_path):
        fs, x = wavfile.read(wav_path)
        x = x.astype(np.float64)
        f0, timeaxis = pyworld.dio(x, fs, frame_period = frame_period)
        f0 = pyworld.stonemask(x, f0, timeaxis, fs)
        spectrogram = pyworld.cheaptrick(x, f0, timeaxis, fs)
        spectrogram = P.trim_zeros_frames(spectrogram)
        if self.alpha is None:
            self.alpha = pysptk.util.mcepalpha(fs)
        mgc = pysptk.sp2mc(spectrogram, order = order, alpha=self.alpha)
        # Drop 0-th coefficient
        mgc = mgc[:, 1:]
        # 50Hz cut-off MS smoothing
        hop_length = int(fs * (frame_period * 0.001))
        modfs = fs / hop_length
        mgc = P.modspec_smoothing(mgc, modfs, cutoff=50)
        # Add delta
        mgc = P.delta_features(mgc, windows)
        return mgc.astype(np.float32)

if __name__ == "__main__":
    #args = docopt(__doc__)
    #print("Command line args:\n", args)
    #DATA_ROOT = args["<DATA_ROOT>"]

#     source_speaker = args["<source_speaker>"]
#     target_speaker = args["<target_speaker>"]
#     max_files = int(args["--max_files"])
#     max_files = int(args["--max_files"])

#     dst_dir = args["--dst_dir"]
#     overwrite = args["--overwrite"]

    DATA_ROOT = "/root/data/VCTK-Corpus/"
    source_speaker = ['228']
    target_speaker = ['229']

    max_files = 100

    dst_dir = os.path.join(os.getcwd() , "mgc_files/")

    if not os.path.exists(dst_dir):
        os.makedirs(dst_dir)

    overwrite = True

    #print(hparams_debug_string(hp))

    X_dataset = FileSourceDataset(MGCSource(DATA_ROOT, speakers=source_speaker  ) )
    Y_dataset = FileSourceDataset(MGCSource(DATA_ROOT, speakers = target_speaker))

    skip_feature_extraction = exists(join(dst_dir, "X")) \
        and exists(join(dst_dir, "Y"))
    if overwrite:
        skip_feature_extraction = False
    if skip_feature_extraction:
        print("Features seems to be prepared, skipping feature extraction.")
        sys.exit(0)

    # Create dirs
    for speaker, name in [(source_speaker, "X"), (target_speaker, "Y")]:
        d = join(dst_dir, name)
        print("Destination dir for {}: {}".format(speaker, d))
        if not exists(d):
            os.makedirs(d)

    # Convert to arrays
    #print(X_dataset.shape)
    #sys.exit(0)
    print("Convert datasets to arrays")
    X = X_dataset.asarray(verbose=1) . ## The error happens here

    Y = Y_dataset.asarray(verbose=1)

    # Alignment
    print("Perform alignment")
    X, Y = DTWAligner().transform((X, Y))

    print("Save features to disk")
    for idx, (x, y) in tqdm(enumerate(zip(X, Y))):
        # paths
        src_name = splitext(basename(X_dataset.collected_files[idx][0]))[0]
        tgt_name = splitext(basename(Y_dataset.collected_files[idx][0]))[0]
        src_path = join(dst_dir, "X", src_name)
        tgt_path = join(dst_dir, "Y", tgt_name)

        # Trim and ajast frames
        x = P.trim_zeros_frames(x)
        y = P.trim_zeros_frames(y)
        x, y = P.adjast_frame_lengths(x, y, pad=True, divisible_by=2)

        # Save
        np.save(src_path, x)
        print("creating file " + str(src_path))
        #sys.exit()
        np.save(tgt_path, y)

    sys.exit(0)
r9y9 commented 6 years ago

Thank you for the code. I found a bug that can cause large memory re-allocations. Fix coming.

r9y9 commented 6 years ago

@rishabh135 Could you check that it works with new version? The following command should install latest version of nnmnkwii:

pip install --upgrade git+https://github.com/r9y9/nnmnkwii
rishabh135 commented 6 years ago

Yeah , it works now or at least doesn't exit in between . Thanks 👍 Though I am wondering if you can discuss the usage of MGCs (I think which are same as MCEPs) vs MFCCs (which seem to be a little more popular ) ?

r9y9 commented 6 years ago

Glad to hear that fixes your issue! I will tag a new release soon.

As far as I know, MFCC is a compressed representation of mel-frequency spectra. It has been used in speech recognition. On the other hand, Mel-cepstrum is not just a compressed representation, but also is carefully designed to be a filter parameter for speech synthesis. It has been used in mainly for voice conversion and speech synthesis.

r9y9 commented 6 years ago

Tagged a new release https://github.com/r9y9/nnmnkwii/releases/tag/v0.0.10.

I will close this issue since the bug was fixed. Feel free to open new issues if you have any other problem.