nii-yamagishilab / ZMM-TTS

ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations
BSD 3-Clause "New" or "Revised" License
107 stars 8 forks source link

Regarding pitch_embedding and energy_embedding #6

Open su4kk opened 1 month ago

su4kk commented 1 month ago

HI,we are getting errors when pitch_embedding and energy_embedding set true during txt2vec training

pavanhitloop commented 1 month ago

Hi,

When I try to enable the use_energy_embed: True in Config/txt2vec/MM6_IPA/model.yaml, i'm getting the following error (couldn't find the stats.json file in this repo as well.

Traceback (most recent call last): File "ZMM-TTS/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap fn(i, *args) File "ZMM-TTS/txt2vec/train.py", line 56, in train model, optimizer = get_model(args, configs, device, train=True) File "ZMM-TTS/txt2vec/utils/model.py", line 14, in get_model model = CompTransTTS(preprocess_config, model_config, train_config).to(device) File "ZMM-TTS/txt2vec/model/CompTransTTS.py", line 57, in init self.variance_adaptor = VarianceAdaptor(preprocess_config, model_config, train_config, 256) File "ZMM-TTS/txt2vec/model/modules.py", line 795, in init with open( FileNotFoundError: [Errno 2] No such file or directory: 'Dataset/preprocessed_data/MM6/stats.json'

Note: I'm using the same configuration from this repo except enabling energy embedding.

Please let me know if any have solution for this. Thanks in advance.

gongchenghhu commented 1 month ago

@su4kk @pavanhitloop Sorry for the late reply. It is feasible to use pitch and energy embeddings. You may refer to the repository https://github.com/keonlee9420/Comprehensive-Transformer-TTS for further details. It is important to note that you need to modify our preprocessing code to extract the pitch and energy features.

pavanhitloop commented 1 month ago

@gongchenghhu, thank you for your response. I'll try and update here if any issues.