sh-lee-prml / HierSpeechpp

The official implementation of HierSpeech++
MIT License
1.13k stars 134 forks source link

monotonic align not found at ttv_v1/t2w2v_transformer.py #48

Closed NoSavedDATA closed 1 month ago

NoSavedDATA commented 1 month ago

File "/root/HierSpeechpp/ttv_v1/monotonic_align/init.py", line 3, in from .monotonic_align.core import maximum_path_c ModuleNotFoundError: No module named 'ttv_v1.monotonic_align.monotonic_align.core'

NoSavedDATA commented 1 month ago

File "/root/HierSpeechpp/ttv_v1/t2w2v_transformer.py", line 422, in forward attn = monotonic_align.maximum_path(neg_cent, attn_mask.squeeze(1)).unsqueeze(1).detach() NameError: name 'monotonic_align' is not defined

hayeong0 commented 1 month ago

You need to build MAS and then use it. Please use setup.py within the directory.

NoSavedDATA commented 1 month ago

At the monotonic_align path, I've executed the commands:

python setup.py build
pip install .

I never had to use Cython code before, so I don't know if this is correct.

Currently, I've solved the problem by adding these lines to t2w2v_transformer.py:

from monotonic_align.core import maximum_path_c

def maximum_path(neg_cent, mask):
  device = neg_cent.device
  dtype = neg_cent.dtype
  neg_cent = neg_cent.data.cpu().numpy().astype(np.float32)
  path = np.zeros(neg_cent.shape, dtype=np.int32)

  t_t_max = mask.sum(1)[:, 0].data.cpu().numpy().astype(np.int32)
  t_s_max = mask.sum(2)[:, 0].data.cpu().numpy().astype(np.int32)
  maximum_path_c(path, neg_cent, t_t_max, t_s_max)
  return torch.from_numpy(path).to(device=device, dtype=dtype)
sh-lee-prml commented 1 month ago

The details for MAS are described in https://github.com/jaywalnut310/vits

Thanks!