[Bug] Server does not support fairseq models

DenysVuika commented 6 months ago

Describe the bug

It is not possible to run the server.py with the fairseq

To Reproduce

docker run --rm -it -p 5002:5002 --platform linux/amd64 --entrypoint /bin/bash ghcr.io/idiap/coqui-tts
python3 TTS/server/server.py --model_name "tts_models/crh/fairseq/vits"

Expected behavior

No response

Logs

File "/root/TTS/server/server.py", line 111, in <module>
    synthesizer = Synthesizer(
  File "/root/TTS/utils/synthesizer.py", line 96, in __init__
    self._load_tts(tts_checkpoint, tts_config_path, use_cuda)
  File "/root/TTS/utils/synthesizer.py", line 186, in _load_tts
    self.tts_config = load_config(tts_config_path)
  File "/root/TTS/config/__init__.py", line 85, in load_config
    ext = os.path.splitext(config_path)[1]
  File "/usr/lib/python3.10/posixpath.py", line 118, in splitext
    p = os.fspath(p)
TypeError: expected str, bytes or os.PathLike object, not NoneType

Environment

Docker container

Additional context

No response

eginhard commented 6 months ago

Yes, the server doesn't support the fairseq models yet (also see https://github.com/coqui-ai/TTS/discussions/3211), also without the Docker: tts-server --model_name "tts_models/crh/fairseq/vits"

mantrakp04 commented 6 months ago

similar behavior on python API, fairseq models don't work


Type "help", "copyright", "credits" or "license" for more information.
>>> import torch

>>> from TTS.api import TTS
>>> api = TTS("tts_models/deu/fairseq/vits")
ith_vc_to_file(
    "Wie sage ich auf Italienisch, dass ich dich liebe?",
    speaker_wav="nb/audios/OpenVoice/resources/example_reference.mp3",
    file_path="out.wav"
)

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌| 135M/135M [00:40<00:00, 2.02MiB/s]>>> api.tts_with_vc_to_file(
...     "Wie sage ich auf Italienisch, dass ich dich liebe?",
...     speaker_wav="nb/audios/OpenVoice/resources/example_reference.mp3",
...     file_path="out.wav"
... )
wie sage ich auf italienisch, dass ich dich liebe?
Character ',' not found in the vocabulary. Discarding it.
wie sage ich auf italienisch, dass ich dich liebe?
Character '?' not found in the vocabulary. Discarding it.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/barrel/miniconda3/envs/cotts/lib/python3.11/site-packages/TTS/api.py", line 455, in tts_with_vc_to_file
    wav = self.tts_with_vc(
          ^^^^^^^^^^^^^^^^^
  File "/home/barrel/miniconda3/envs/cotts/lib/python3.11/site-packages/TTS/api.py", line 419, in tts_with_vc
    self.load_vc_model_by_name("voice_conversion_models/multilingual/vctk/freevc24")
  File "/home/barrel/miniconda3/envs/cotts/lib/python3.11/site-packages/TTS/api.py", line 159, in load_vc_model_by_name
    self.voice_converter = Synthesizer(vc_checkpoint=model_path, vc_config=config_path, use_cuda=gpu)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/barrel/miniconda3/envs/cotts/lib/python3.11/site-packages/TTS/utils/synthesizer.py", line 104, in __init__
    self._load_vc(vc_checkpoint, vc_config, use_cuda)
  File "/home/barrel/miniconda3/envs/cotts/lib/python3.11/site-packages/TTS/utils/synthesizer.py", line 142, in _load_vc
    self.vc_model = setup_vc_model(config=self.vc_config)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/barrel/miniconda3/envs/cotts/lib/python3.11/site-packages/TTS/vc/models/__init__.py", line 19, in setup_model
    model = MyModel.init_from_config(config, samples)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/barrel/miniconda3/envs/cotts/lib/python3.11/site-packages/TTS/vc/models/freevc.py", line 554, in init_from_config
    model = FreeVC(config)
            ^^^^^^^^^^^^^^
  File "/home/barrel/miniconda3/envs/cotts/lib/python3.11/site-packages/TTS/vc/models/freevc.py", line 375, in __init__
    self.wavlm = get_wavlm()
                 ^^^^^^^^^^^
  File "/home/barrel/miniconda3/envs/cotts/lib/python3.11/site-packages/TTS/vc/modules/freevc/wavlm/__init__.py", line 29, in get_wavlm
    checkpoint = torch.load(output_path, map_location=torch.device(device))
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/barrel/miniconda3/envs/cotts/lib/python3.11/site-packages/torch/serialization.py", line 1004, in load
    with _open_zipfile_reader(opened_file) as opened_zipfile:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/barrel/miniconda3/envs/cotts/lib/python3.11/site-packages/torch/serialization.py", line 456, in __init__
    super().__init__(torch._C.PyTorchFileReader(name_or_buffer))
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory```

eginhard commented 6 months ago

similar behavior on python API, fairseq models don't work

@mantrakp04 No, fairseq models do work fine with the Python API. Your error message is different and indicates that some of the downloaded model files might be corrupted. Try redownloading both the Fairseq and the FreeVC model (by deleting the corresponding folders). You also don't have to use the VC model, i.e. just api.tts_to_file("Wie sage ich auf Italienisch, dass ich dich liebe?", file_path="output.wav") works as well.

erew123 commented 6 months ago

FYI Facebook/Fairseq on Windows stopped on 12.2 of Fariseq back in 2022 and it doesn't work with versions of Python beyond 3.10, so its kind of dead in the water for future updates.

Fairseq version 2, Facebook say they will not be supporting Windows platforms https://github.com/facebookresearch/fairseq2?tab=readme-ov-file#installing-on-windows

I have some kludge versions of fairseq, aka version 12.3 if you will, that work on Windows, and there is someone who has updated Fariseq 12.x beyond that https://github.com/VarunGumma/fairseq, however, I couldn't compile it on Windows currently, due to a bug in Pytorch that will be resolved in later versions of Pytorch (so I read hunting the forums on the issues).

eginhard commented 6 months ago

FYI Facebook/Fairseq on Windows stopped on 12.2 of Fariseq back in 2022 and it doesn't work with versions of Python beyond 3.10, so its kind of dead in the water for future updates.

Coqui doesn't depend on fairseq at all, so this is not an issue. It just loads the trained fairseq models and makes them compatible with Coqui's format.

idiap / coqui-ai-TTS