myshell-ai / MeloTTS

High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
MIT License
3.97k stars 473 forks source link

mecab-python3 and python-mecab-ko conflict #121

Open kulogix opened 1 month ago

kulogix commented 1 month ago

Trying to run OpenVoice/demo_part3.ipynb on Apple Silicon. Even with workarounds (listed below), it attempts to auto-install python-mecab-ko which causes conflicts with mecab-python3.
Appears to happen regardless of whether mecab or mecab-ko is installed.

Two other similar issues were closed without actual resolution. https://github.com/myshell-ai/MeloTTS/issues/119 https://github.com/myshell-ai/MeloTTS/issues/113 Why keep closing them without addressing the issues?

Apple Silicon, python 3.10.14 virtual environment. brew install mecab Installed MeloTTS, first removing extra "mecab-python3==1.0.5", and removing version from 2nd one. python -m unidic download

Using: python_mecab_ko-1.3.5-cp310-cp310-macosx_11_0_arm64.whl mecab_python3-1.0.9-cp310-cp310-macosx_11_0_arm64.whl

%set_env PYTORCH_ENABLE_MPS_FALLBACK=1 edited openvoice/se_extractor.py:22: model = WhisperModel(model_size, device="cpu", compute_type="float32")

device = "cuda:0" if torch.cuda.is_available() else "cpu"

device = "mps"

you have to install python-mecab-ko. install it...
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
    - Avoid using `tokenizers` before the fork if possible
    - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Collecting python-mecab-ko
  Using cached python_mecab_ko-1.3.5-cp310-cp310-macosx_11_0_arm64.whl.metadata (3.4 kB)
Requirement already satisfied: python-mecab-ko-dic in /Users/awannord/Virtualenvs/openvoice/lib/python3.10/site-packages (from python-mecab-ko) (2.1.1.post2)
Using cached python_mecab_ko-1.3.5-cp310-cp310-macosx_11_0_arm64.whl (348 kB)
Installing collected packages: python-mecab-ko
Successfully installed python-mecab-ko-1.3.5
you have to install python-mecab-ko. "pip install python-mecab-ko"

AttributeError                            Traceback (most recent call last)
Cell In[5], line 28
     25 speaker_key = speaker_key.lower().replace('_', '-')
     27 source_se = torch.load(f'checkpoints_v2[/base_speakers/ses/](http://localhost:8888/base_speakers/ses/){speaker_key}.pth', map_location=device)
---> 28 model.tts_to_file(text, speaker_id, src_path, speed=speed)
     29 save_path = f'{output_dir}[/output_v2_](http://localhost:8888/output_v2_){speaker_key}.wav'
     31 # Run the tone color converter

File [~/Virtualenvs/openvoice/MeloTTS/melo/api.py:100](http://localhost:8888/~/Virtualenvs/openvoice/MeloTTS/melo/api.py#line=99), in TTS.tts_to_file(self, text, speaker_id, output_path, sdp_ratio, noise_scale, noise_scale_w, speed, pbar, format, position, quiet)
     98     t = re.sub(r'([a-z])([A-Z])', r'\1 \2', t)
     99 device = self.device
--> 100 bert, ja_bert, phones, tones, lang_ids = utils.get_text_for_tts_infer(t, language, self.hps, device, self.symbol_to_id)
    101 with torch.no_grad():
    102     x_tst = phones.to(device).unsqueeze(0)

File [~/Virtualenvs/openvoice/MeloTTS/melo/utils.py:23](http://localhost:8888/~/Virtualenvs/openvoice/MeloTTS/melo/utils.py#line=22), in get_text_for_tts_infer(text, language_str, hps, device, symbol_to_id)
     22 def get_text_for_tts_infer(text, language_str, hps, device, symbol_to_id=None):
---> 23     norm_text, phone, tone, word2ph = clean_text(text, language_str)
     24     phone, tone, language = cleaned_text_to_sequence(phone, tone, language_str, symbol_to_id)
     26     if hps.data.add_blank:

File [~/Virtualenvs/openvoice/MeloTTS/melo/text/cleaner.py:12](http://localhost:8888/~/Virtualenvs/openvoice/MeloTTS/melo/text/cleaner.py#line=11), in clean_text(text, language)
     10 language_module = language_module_map[language]
     11 norm_text = language_module.text_normalize(text)
---> 12 phones, tones, word2ph = language_module.g2p(norm_text)
     13 return norm_text, phones, tones, word2ph

File [~/Virtualenvs/openvoice/MeloTTS/melo/text/korean.py:122](http://localhost:8888/~/Virtualenvs/openvoice/MeloTTS/melo/text/korean.py#line=121), in g2p(norm_text)
    118     continue
    119 # import pdb; pdb.set_trace()
    120 # phonemes = japanese_text_to_phonemes(text)
    121 # text = g2p_kr(text)
--> 122 phonemes = korean_text_to_phonemes(text)
    123 # import pdb; pdb.set_trace()
    124 # # phonemes = [i for i in phonemes if i in symbols]
    125 # for i in phonemes:
    126 #     assert i in symbols, (group, norm_text, tokenized, i)
    127 phone_len = len(phonemes)

File [~/Virtualenvs/openvoice/MeloTTS/melo/text/korean.py:69](http://localhost:8888/~/Virtualenvs/openvoice/MeloTTS/melo/text/korean.py#line=68), in korean_text_to_phonemes(text, character)
     66     return text
     68 text = normalize(text)
---> 69 text = g2p_kr(text)
     70 text = list(hangul_to_jamo(text))  # '하늘' --> ['ᄒ', 'ᅡ', 'ᄂ', 'ᅳ', 'ᆯ']
     71 return "".join(text)

File [~/Virtualenvs/openvoice/lib/python3.10/site-packages/g2pkk/g2pkk.py:129](http://localhost:8888/~/Virtualenvs/openvoice/lib/python3.10/site-packages/g2pkk/g2pkk.py#line=128), in G2p.__call__(self, string, descriptive, verbose, group_vowels, to_syl)
    126 string = convert_eng(string, self.cmu)
    128 # 3. annotate
--> 129 string = annotate(string, self.mecab)
    132 # 4. Spell out arabic numbers
    133 string = convert_num(string)

File [~/Virtualenvs/openvoice/lib/python3.10/site-packages/g2pkk/utils.py:166](http://localhost:8888/~/Virtualenvs/openvoice/lib/python3.10/site-packages/g2pkk/utils.py#line=165), in annotate(string, mecab)
    165 def annotate(string, mecab):
--> 166     tokens = mecab.pos(string)
    167     if string.replace(" ", "") != "".join(token for token, _ in tokens):
    168         return string

AttributeError: 'NoneType' object has no attribute 'pos'

Running for a 2nd time after this (in a new session) results in different error. Happens anytime python-mecab-ko is installed after mecab-python3. New error:

ModuleNotFoundError                       Traceback (most recent call last)
File [~/Virtualenvs/openvoice/MeloTTS/melo/text/japanese.py:12](http://localhost:8888/~/Virtualenvs/openvoice/MeloTTS/melo/text/japanese.py#line=11)
     11 try:
---> 12     import MeCab
     13 except ImportError as e:

File [~/Virtualenvs/openvoice/lib/python3.10/site-packages/MeCab/__init__.py:1](http://localhost:8888/~/Virtualenvs/openvoice/lib/python3.10/site-packages/MeCab/__init__.py#line=0)
----> 1 from .mecab import MeCab, MeCabError, mecabrc_path
      2 from .types import Dictionary, Feature, Morpheme, Span

File [~/Virtualenvs/openvoice/lib/python3.10/site-packages/MeCab/mecab.py:9](http://localhost:8888/~/Virtualenvs/openvoice/lib/python3.10/site-packages/MeCab/mecab.py#line=8)
      8 import _mecab
----> 9 from mecab.types import Dictionary, Morpheme
     10 from mecab.utils import create_lattice, ensure_list, to_csv

ModuleNotFoundError: No module named 'mecab'

The above exception was the direct cause of the following exception:

ImportError                               Traceback (most recent call last)
Cell In[5], line 1
----> 1 from melo.api import TTS
      3 texts = {
      4     'EN_NEWEST': "Did you ever hear a folk tale about a giant turtle?",  # The newest English base speaker model
      5     'EN': "Did you ever hear a folk tale about a giant turtle?",
   (...)
     10     'KR': "안녕하세요! 오늘은 날씨가 정말 좋네요.",
     11 }
     14 src_path = f'{output_dir}[/tmp.wav](http://localhost:8888/tmp.wav)'

File [~/Virtualenvs/openvoice/MeloTTS/melo/api.py:13](http://localhost:8888/~/Virtualenvs/openvoice/MeloTTS/melo/api.py#line=12)
     10 from tqdm import tqdm
     11 import torch
---> 13 from . import utils
     14 from . import commons
     15 from .models import SynthesizerTrn

File [~/Virtualenvs/openvoice/MeloTTS/melo/utils.py:13](http://localhost:8888/~/Virtualenvs/openvoice/MeloTTS/melo/utils.py#line=12)
     11 import librosa
     12 from melo.text import cleaned_text_to_sequence, get_bert
---> 13 from melo.text.cleaner import clean_text
     14 from melo import commons
     16 MATPLOTLIB_FLAG = False

File [~/Virtualenvs/openvoice/MeloTTS/melo/text/cleaner.py:1](http://localhost:8888/~/Virtualenvs/openvoice/MeloTTS/melo/text/cleaner.py#line=0)
----> 1 from . import chinese, japanese, english, chinese_mix, korean, french, spanish
      2 from . import cleaned_text_to_sequence
      3 import copy

File [~/Virtualenvs/openvoice/MeloTTS/melo/text/japanese.py:14](http://localhost:8888/~/Virtualenvs/openvoice/MeloTTS/melo/text/japanese.py#line=13)
     12     import MeCab
     13 except ImportError as e:
---> 14     raise ImportError("Japanese requires mecab-python3 and unidic-lite.") from e
     15 from num2words import num2words
     17 _CONVRULES = [
     18     # Conversion of 2 letters
     19     "アァ[/](http://localhost:8888/) a a",
   (...)
    318     "・[/](http://localhost:8888/) ,",
    319 ]

ImportError: Japanese requires mecab-python3 and unidic-lite.

To revert to the first error:

pip uninstall mecab-python3 python-mecab-ko mecab
pip install mecab-python3
## without manually re-installing python-mecab-ko

Tried pip install mecab, got the following:

RuntimeError                              Traceback (most recent call last)
Cell In[5], line 1
----> 1 from melo.api import TTS
      3 texts = {
      4     'EN_NEWEST': "Did you ever hear a folk tale about a giant turtle?",  # The newest English base speaker model
      5     'EN': "Did you ever hear a folk tale about a giant turtle?",
   (...)
     10     'KR': "안녕하세요! 오늘은 날씨가 정말 좋네요.",
     11 }
     14 src_path = f'{output_dir}[/tmp.wav](http://localhost:8888/tmp.wav)'

File [~/Virtualenvs/openvoice/MeloTTS/melo/api.py:13](http://localhost:8888/lab/workspaces/~/Virtualenvs/openvoice/MeloTTS/melo/api.py#line=12)
     10 from tqdm import tqdm
     11 import torch
---> 13 from . import utils
     14 from . import commons
     15 from .models import SynthesizerTrn

File [~/Virtualenvs/openvoice/MeloTTS/melo/utils.py:13](http://localhost:8888/lab/workspaces/~/Virtualenvs/openvoice/MeloTTS/melo/utils.py#line=12)
     11 import librosa
     12 from melo.text import cleaned_text_to_sequence, get_bert
---> 13 from melo.text.cleaner import clean_text
     14 from melo import commons
     16 MATPLOTLIB_FLAG = False

File [~/Virtualenvs/openvoice/MeloTTS/melo/text/cleaner.py:1](http://localhost:8888/lab/workspaces/~/Virtualenvs/openvoice/MeloTTS/melo/text/cleaner.py#line=0)
----> 1 from . import chinese, japanese, english, chinese_mix, korean, french, spanish
      2 from . import cleaned_text_to_sequence
      3 import copy

File [~/Virtualenvs/openvoice/MeloTTS/melo/text/japanese.py:367](http://localhost:8888/lab/workspaces/~/Virtualenvs/openvoice/MeloTTS/melo/text/japanese.py#line=366)
    365 _SYMBOL_TOKENS = set(list("・、。?!"))
    366 _NO_YOMI_TOKENS = set(list("「」『』―()[][]"))
--> 367 _TAGGER = MeCab.Tagger()
    370 def text2kata(text: str) -> str:
    371     parsed = _TAGGER.parse(text)

File [~/Virtualenvs/openvoice/lib/python3.10/site-packages/MeCab.py:355](http://localhost:8888/lab/workspaces/~/Virtualenvs/openvoice/lib/python3.10/site-packages/MeCab.py#line=354), in Tagger.__init__(self, *args)
    354 def __init__(self, *args):
--> 355     _MeCab.Tagger_swiginit(self, _MeCab.new_Tagger(*args))

RuntimeError:

Tried installing mecab-ko instead of mecab:

brew uninstall mecab
brew install mecab-ko
pip uninstall mecab-python3 python-mecab-ko mecab
pip install mecab-python3

Get the similar as the first error (attempts auto-install of python-mecab-ko, then complains mecab.pos doesn't exist):

you have to install python-mecab-ko. install it...
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
    - Avoid using `tokenizers` before the fork if possible
    - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Collecting python-mecab-ko
  Using cached python_mecab_ko-1.3.5-cp310-cp310-macosx_11_0_arm64.whl.metadata (3.4 kB)
Requirement already satisfied: python-mecab-ko-dic in /Users/awannord/Virtualenvs/openvoice/lib/python3.10/site-packages (from python-mecab-ko) (2.1.1.post2)
Using cached python_mecab_ko-1.3.5-cp310-cp310-macosx_11_0_arm64.whl (348 kB)
Installing collected packages: python-mecab-ko
Successfully installed python-mecab-ko-1.3.5
you have to install python-mecab-ko. "pip install python-mecab-ko"

AttributeError                            Traceback (most recent call last)
Cell In[5], line 28
     25 speaker_key = speaker_key.lower().replace('_', '-')
     27 source_se = torch.load(f'checkpoints_v2[/base_speakers/ses/](http://localhost:8888/base_speakers/ses/){speaker_key}.pth', map_location=device)
---> 28 model.tts_to_file(text, speaker_id, src_path, speed=speed)
     29 save_path = f'{output_dir}[/output_v2_](http://localhost:8888/output_v2_){speaker_key}.wav'
     31 # Run the tone color converter

File [~/Virtualenvs/openvoice/MeloTTS/melo/api.py:100](http://localhost:8888/lab/workspaces/~/Virtualenvs/openvoice/MeloTTS/melo/api.py#line=99), in TTS.tts_to_file(self, text, speaker_id, output_path, sdp_ratio, noise_scale, noise_scale_w, speed, pbar, format, position, quiet)
     98     t = re.sub(r'([a-z])([A-Z])', r'\1 \2', t)
     99 device = self.device
--> 100 bert, ja_bert, phones, tones, lang_ids = utils.get_text_for_tts_infer(t, language, self.hps, device, self.symbol_to_id)
    101 with torch.no_grad():
    102     x_tst = phones.to(device).unsqueeze(0)

File [~/Virtualenvs/openvoice/MeloTTS/melo/utils.py:23](http://localhost:8888/lab/workspaces/~/Virtualenvs/openvoice/MeloTTS/melo/utils.py#line=22), in get_text_for_tts_infer(text, language_str, hps, device, symbol_to_id)
     22 def get_text_for_tts_infer(text, language_str, hps, device, symbol_to_id=None):
---> 23     norm_text, phone, tone, word2ph = clean_text(text, language_str)
     24     phone, tone, language = cleaned_text_to_sequence(phone, tone, language_str, symbol_to_id)
     26     if hps.data.add_blank:

File [~/Virtualenvs/openvoice/MeloTTS/melo/text/cleaner.py:12](http://localhost:8888/lab/workspaces/~/Virtualenvs/openvoice/MeloTTS/melo/text/cleaner.py#line=11), in clean_text(text, language)
     10 language_module = language_module_map[language]
     11 norm_text = language_module.text_normalize(text)
---> 12 phones, tones, word2ph = language_module.g2p(norm_text)
     13 return norm_text, phones, tones, word2ph

File [~/Virtualenvs/openvoice/MeloTTS/melo/text/korean.py:122](http://localhost:8888/lab/workspaces/~/Virtualenvs/openvoice/MeloTTS/melo/text/korean.py#line=121), in g2p(norm_text)
    118     continue
    119 # import pdb; pdb.set_trace()
    120 # phonemes = japanese_text_to_phonemes(text)
    121 # text = g2p_kr(text)
--> 122 phonemes = korean_text_to_phonemes(text)
    123 # import pdb; pdb.set_trace()
    124 # # phonemes = [i for i in phonemes if i in symbols]
    125 # for i in phonemes:
    126 #     assert i in symbols, (group, norm_text, tokenized, i)
    127 phone_len = len(phonemes)

File [~/Virtualenvs/openvoice/MeloTTS/melo/text/korean.py:69](http://localhost:8888/lab/workspaces/~/Virtualenvs/openvoice/MeloTTS/melo/text/korean.py#line=68), in korean_text_to_phonemes(text, character)
     66     return text
     68 text = normalize(text)
---> 69 text = g2p_kr(text)
     70 text = list(hangul_to_jamo(text))  # '하늘' --> ['ᄒ', 'ᅡ', 'ᄂ', 'ᅳ', 'ᆯ']
     71 return "".join(text)

File [~/Virtualenvs/openvoice/lib/python3.10/site-packages/g2pkk/g2pkk.py:129](http://localhost:8888/lab/workspaces/~/Virtualenvs/openvoice/lib/python3.10/site-packages/g2pkk/g2pkk.py#line=128), in G2p.__call__(self, string, descriptive, verbose, group_vowels, to_syl)
    126 string = convert_eng(string, self.cmu)
    128 # 3. annotate
--> 129 string = annotate(string, self.mecab)
    132 # 4. Spell out arabic numbers
    133 string = convert_num(string)

File [~/Virtualenvs/openvoice/lib/python3.10/site-packages/g2pkk/utils.py:166](http://localhost:8888/lab/workspaces/~/Virtualenvs/openvoice/lib/python3.10/site-packages/g2pkk/utils.py#line=165), in annotate(string, mecab)
    165 def annotate(string, mecab):
--> 166     tokens = mecab.pos(string)
    167     if string.replace(" ", "") != "".join(token for token, _ in tokens):
    168         return string

AttributeError: 'NoneType' object has no attribute 'pos'

As before, running a new session (no changes to installs -- other than the previously auto-installed python-mecab-ko), get the following (different) error:

ModuleNotFoundError                       Traceback (most recent call last)
File [~/Virtualenvs/openvoice/MeloTTS/melo/text/japanese.py:12](http://localhost:8888/lab/workspaces/~/Virtualenvs/openvoice/MeloTTS/melo/text/japanese.py#line=11)
     11 try:
---> 12     import MeCab
     13 except ImportError as e:

File [~/Virtualenvs/openvoice/lib/python3.10/site-packages/MeCab/__init__.py:1](http://localhost:8888/lab/workspaces/~/Virtualenvs/openvoice/lib/python3.10/site-packages/MeCab/__init__.py#line=0)
----> 1 from .mecab import MeCab, MeCabError, mecabrc_path
      2 from .types import Dictionary, Feature, Morpheme, Span

File [~/Virtualenvs/openvoice/lib/python3.10/site-packages/MeCab/mecab.py:9](http://localhost:8888/lab/workspaces/~/Virtualenvs/openvoice/lib/python3.10/site-packages/MeCab/mecab.py#line=8)
      8 import _mecab
----> 9 from mecab.types import Dictionary, Morpheme
     10 from mecab.utils import create_lattice, ensure_list, to_csv

ModuleNotFoundError: No module named 'mecab'

The above exception was the direct cause of the following exception:

ImportError                               Traceback (most recent call last)
Cell In[5], line 1
----> 1 from melo.api import TTS
      3 texts = {
      4     'EN_NEWEST': "Did you ever hear a folk tale about a giant turtle?",  # The newest English base speaker model
      5     'EN': "Did you ever hear a folk tale about a giant turtle?",
   (...)
     10     'KR': "안녕하세요! 오늘은 날씨가 정말 좋네요.",
     11 }
     14 src_path = f'{output_dir}[/tmp.wav](http://localhost:8888/tmp.wav)'

File [~/Virtualenvs/openvoice/MeloTTS/melo/api.py:13](http://localhost:8888/lab/workspaces/~/Virtualenvs/openvoice/MeloTTS/melo/api.py#line=12)
     10 from tqdm import tqdm
     11 import torch
---> 13 from . import utils
     14 from . import commons
     15 from .models import SynthesizerTrn

File [~/Virtualenvs/openvoice/MeloTTS/melo/utils.py:13](http://localhost:8888/lab/workspaces/~/Virtualenvs/openvoice/MeloTTS/melo/utils.py#line=12)
     11 import librosa
     12 from melo.text import cleaned_text_to_sequence, get_bert
---> 13 from melo.text.cleaner import clean_text
     14 from melo import commons
     16 MATPLOTLIB_FLAG = False

File [~/Virtualenvs/openvoice/MeloTTS/melo/text/cleaner.py:1](http://localhost:8888/lab/workspaces/~/Virtualenvs/openvoice/MeloTTS/melo/text/cleaner.py#line=0)
----> 1 from . import chinese, japanese, english, chinese_mix, korean, french, spanish
      2 from . import cleaned_text_to_sequence
      3 import copy

File [~/Virtualenvs/openvoice/MeloTTS/melo/text/japanese.py:14](http://localhost:8888/lab/workspaces/~/Virtualenvs/openvoice/MeloTTS/melo/text/japanese.py#line=13)
     12     import MeCab
     13 except ImportError as e:
---> 14     raise ImportError("Japanese requires mecab-python3 and unidic-lite.") from e
     15 from num2words import num2words
     17 _CONVRULES = [
     18     # Conversion of 2 letters
     19     "アァ[/](http://localhost:8888/) a a",
   (...)
    318     "・[/](http://localhost:8888/) ,",
    319 ]

ImportError: Japanese requires mecab-python3 and unidic-lite.
kulogix commented 1 month ago

Update: On Mac (and Windows), the file system is case insensitive by default. python-mecab-ko tries to install to mecab, and mecab-python3 to MeCab. If you're not on a case-sensitive file system, one install will overwrite/merge with the other.

On Mac, create a new case-sensitive volume and setup your virtual environment there: Diskutils: + Volume, Name: Playground, Format: APFS (Case-sensitive) Create a symbolic link in your home folder: ln -s /Volumes/Playground ~/playground

Once the python project are setup on a case-sensitive file system, then everything works (for European, Japanese, Korean, and Chinese). I also had mecab-ko installed: brew install mecab-ko

It would be nice if these tips (case-sensitive volume, how to create on Mac, and need to install mecab vs mecab-ko) was added to the README / docs -- for those that don't want to rely on the Docker.

yunseung-dable commented 1 month ago

same issue. is there any replacement for MeCab?

nimo1996 commented 3 weeks ago

me also...

so i tried

  1. docker build,

  2. edit requirement.txt botocore==1.34.88 cached_path==1.6.2

and everything resolved