jasonppy / VoiceCraft

Zero-Shot Speech Editing and Text-to-Speech in the Wild
Other
7.27k stars 714 forks source link

AttributeError: module 'torch' has no attribute 'compiler' and other various issue #29

Closed Sewlell closed 3 months ago

Sewlell commented 3 months ago

System

Windows 11 NVIDIA MX130 i5-10210U 12GB RAM

Error Code

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[11], [line 12](vscode-notebook-cell:?execution_count=11&line=12)
      [8](vscode-notebook-cell:?execution_count=11&line=8) prompt_end_frame = int(cut_off_sec * info.sample_rate)
     [11](vscode-notebook-cell:?execution_count=11&line=11) # # load model, tokenizer, and other necessary files
---> [12](vscode-notebook-cell:?execution_count=11&line=12) from models import voicecraft
     [13](vscode-notebook-cell:?execution_count=11&line=13) voicecraft_name="giga830M.pth"
     [14](vscode-notebook-cell:?execution_count=11&line=14) ckpt_fn =f"[./pretrained_models/](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/NAME/TTS/src/audiocraft/audiocraft/pretrained_models/){voicecraft_name}"

File [c:\Users\PEY3C\TTS\src\audiocraft\audiocraft\models\__init__.py:10](file:///C:/Users/NAME/TTS/src/audiocraft/audiocraft/models/__init__.py:10)
      [6](file:///C:/Users/NAME/TTS/src/audiocraft/audiocraft/models/__init__.py:6) """
      [7](file:///C:/Users/NAME/TTS/src/audiocraft/audiocraft/models/__init__.py:7) Models for EnCodec, AudioGen, MusicGen, as well as the generic LMModel.
      [8](file:///C:/Users/NAME/TTS/src/audiocraft/audiocraft/models/__init__.py:8) """
      [9](file:///C:/Users/NAME/TTS/src/audiocraft/audiocraft/models/__init__.py:9) # flake8: noqa
---> [10](file:///C:/Users/NAME/TTS/src/audiocraft/audiocraft/models/__init__.py:10) from . import builders, loaders
     [11](file:///C:/Users/NAME/TTS/src/audiocraft/audiocraft/models/__init__.py:11) from .encodec import (
     [12](file:///C:/Users/NAME/TTS/src/audiocraft/audiocraft/models/__init__.py:12)     CompressionModel, EncodecModel, DAC,
     [13](file:///C:/Users/NAME/TTS/src/audiocraft/audiocraft/models/__init__.py:13)     HFEncodecModel, HFEncodecCompressionModel)
     [14](file:///C:/Users/NAME/TTS/src/audiocraft/audiocraft/models/__init__.py:14) from .audiogen import AudioGen

File [c:\Users\NAME\TTS\src\audiocraft\audiocraft\models\builders.py:14](file:///C:/Users/NAME/TTS/src/audiocraft/audiocraft/models/builders.py:14)
      [7](file:///C:/Users/NAME/TTS/src/audiocraft/audiocraft/models/builders.py:7) """
      [8](file:///C:/Users/NAME/TTS/src/audiocraft/audiocraft/models/builders.py:8) All the functions to build the relevant models and modules
      [9](file:///C:/Users/NAME/TTS/src/audiocraft/audiocraft/models/builders.py:9) from the Hydra config.
     [10](file:///C:/Users/NAME/TTS/src/audiocraft/audiocraft/models/builders.py:10) """
     [12](file:///C:/Users/NAME/TTS/src/audiocraft/audiocraft/models/builders.py:12) import typing as tp
---> [14](file:///C:/Users/NAME/TTS/src/audiocraft/audiocraft/models/builders.py:14) import audiocraft
     [15](file:///C:/Users/NAME/TTS/src/audiocraft/audiocraft/models/builders.py:15) import omegaconf
     [16](file:///C:/Users/NAME/TTS/src/audiocraft/audiocraft/models/builders.py:16) import torch

File [c:\users\name\tts\src\audiocraft\audiocraft\__init__.py:24](file:///C:/users/pey3c/tts/src/audiocraft/audiocraft/__init__.py:24)
      [6](file:///C:/users/name/tts/src/audiocraft/audiocraft/__init__.py:6) """
      [7](file:///C:/users/name/tts/src/audiocraft/audiocraft/__init__.py:7) AudioCraft is a general framework for training audio generative models.
      [8](file:///C:/users/name/tts/src/audiocraft/audiocraft/__init__.py:8) At the moment we provide the training code for:
   (...)
     [20](file:///C:/users/name/tts/src/audiocraft/audiocraft/__init__.py:20)     improves the perceived quality and reduces the artifacts coming from adversarial decoders.
     [21](file:///C:/users/name/tts/src/audiocraft/audiocraft/__init__.py:21) """
     [23](file:///C:/users/name/tts/src/audiocraft/audiocraft/__init__.py:23) # flake8: noqa
---> [24](file:///C:/users/name/tts/src/audiocraft/audiocraft/__init__.py:24) from . import data, modules, models
     [26](file:///C:/users/name/tts/src/audiocraft/audiocraft/__init__.py:26) __version__ = '1.0.0'

File [c:\users\name\tts\src\audiocraft\audiocraft\data\__init__.py:10](file:///C:/users/name/tts/src/audiocraft/audiocraft/data/__init__.py:10)
      [6](file:///C:/users/name/tts/src/audiocraft/audiocraft/data/__init__.py:6) """Audio loading and writing support. Datasets for raw audio
      [7](file:///C:/users/name/tts/src/audiocraft/audiocraft/data/__init__.py:7) or also including some metadata."""
      [9](file:///C:/users/name/tts/src/audiocraft/audiocraft/data/__init__.py:9) # flake8: noqa
---> [10](file:///C:/users/name/tts/src/audiocraft/audiocraft/data/__init__.py:10) from . import audio, audio_dataset, info_audio_dataset, music_dataset, sound_dataset

File [c:\users\name\tts\src\audiocraft\audiocraft\data\info_audio_dataset.py:19](file:///C:/users/name/tts/src/audiocraft/audiocraft/data/info_audio_dataset.py:19)
     [17](file:///C:/users/name/tts/src/audiocraft/audiocraft/data/info_audio_dataset.py:17) from .audio_dataset import AudioDataset, AudioMeta
     [18](file:///C:/users/name/tts/src/audiocraft/audiocraft/data/info_audio_dataset.py:18) from ..environment import AudioCraftEnvironment
---> [19](file:///C:/users/name/tts/src/audiocraft/audiocraft/data/info_audio_dataset.py:19) from ..modules.conditioners import SegmentWithAttributes, ConditioningAttributes
     [22](file:///C:/users/name/tts/src/audiocraft/audiocraft/data/info_audio_dataset.py:22) logger = logging.getLogger(__name__)
     [25](file:///C:/users/name/tts/src/audiocraft/audiocraft/data/info_audio_dataset.py:25) def _clusterify_meta(meta: AudioMeta) -> AudioMeta:

File [c:\users\name\tts\src\audiocraft\audiocraft\modules\__init__.py:22](file:///C:/users/name/tts/src/audiocraft/audiocraft/modules/__init__.py:22)
     [20](file:///C:/users/name/tts/src/audiocraft/audiocraft/modules/__init__.py:20) from .lstm import StreamableLSTM
     [21](file:///C:/users/name/tts/src/audiocraft/audiocraft/modules/__init__.py:21) from .seanet import SEANetEncoder, SEANetDecoder
---> [22](file:///C:/users/name/tts/src/audiocraft/audiocraft/modules/__init__.py:22) from .transformer import StreamingTransformer

File [c:\users\name\tts\src\audiocraft\audiocraft\modules\transformer.py:23](file:///C:/users/name/tts/src/audiocraft/audiocraft/modules/transformer.py:23)
     [21](file:///C:/users/name/tts/src/audiocraft/audiocraft/modules/transformer.py:21) from torch.nn import functional as F
     [22](file:///C:/users/name/tts/src/audiocraft/audiocraft/modules/transformer.py:22) from torch.utils.checkpoint import checkpoint as torch_checkpoint
---> [23](file:///C:/users/name/tts/src/audiocraft/audiocraft/modules/transformer.py:23) from xformers import ops
     [25](file:///C:/users/name/tts/src/audiocraft/audiocraft/modules/transformer.py:25) from .rope import RotaryEmbedding
     [26](file:///C:/users/name/tts/src/audiocraft/audiocraft/modules/transformer.py:26) from .streaming import StreamingModule

File [c:\Users\NAME\miniconda3\envs\voicecraft\lib\site-packages\xformers\__init__.py:12](file:///C:/Users/NAME/miniconda3/envs/voicecraft/lib/site-packages/xformers/__init__.py:12)
      [9](file:///C:/Users/NAME/miniconda3/envs/voicecraft/lib/site-packages/xformers/__init__.py:9) import torch
     [11](file:///C:/Users/NAME/miniconda3/envs/voicecraft/lib/site-packages/xformers/__init__.py:11) from . import _cpp_lib
---> [12](file:///C:/Users/NAME/miniconda3/envs/voicecraft/lib/site-packages/xformers/__init__.py:12) from .checkpoint import (  # noqa: E402, F401
     [13](file:///C:/Users/NAME/miniconda3/envs/voicecraft/lib/site-packages/xformers/__init__.py:13)     checkpoint,
     [14](file:///C:/Users/NAME/miniconda3/envs/voicecraft/lib/site-packages/xformers/__init__.py:14)     get_optimal_checkpoint_policy,
     [15](file:///C:/Users/NAME/miniconda3/envs/voicecraft/lib/site-packages/xformers/__init__.py:15)     list_operators,
     [16](file:///C:/Users/NAME/miniconda3/envs/voicecraft/lib/site-packages/xformers/__init__.py:16)     selective_checkpoint_wrapper,
     [17](file:///C:/Users/NAME/miniconda3/envs/voicecraft/lib/site-packages/xformers/__init__.py:17) )
     [19](file:///C:/Users/NAME/miniconda3/envs/voicecraft/lib/site-packages/xformers/__init__.py:19) try:
     [20](file:///C:/Users/NAME/miniconda3/envs/voicecraft/lib/site-packages/xformers/__init__.py:20)     from .version import __version__  # noqa: F401

File [c:\Users\NAME\miniconda3\envs\voicecraft\lib\site-packages\xformers\checkpoint.py:464](file:///C:/Users/NAME/miniconda3/envs/voicecraft/lib/site-packages/xformers/checkpoint.py:464)
    [460](file:///C:/Users/NAME/miniconda3/envs/voicecraft/lib/site-packages/xformers/checkpoint.py:460)         self.counter += 1
    [461](file:///C:/Users/NAME/miniconda3/envs/voicecraft/lib/site-packages/xformers/checkpoint.py:461)         return self.optim_output[count] == 1
--> [464](file:///C:/Users/NAME/miniconda3/envs/voicecraft/lib/site-packages/xformers/checkpoint.py:464) class SelectiveCheckpointWrapper(ActivationWrapper):
    [465](file:///C:/Users/NAME/miniconda3/envs/voicecraft/lib/site-packages/xformers/checkpoint.py:465)     def __init__(self, mod, memory_budget=None, policy_fn=None):
    [466](file:///C:/Users/NAME/miniconda3/envs/voicecraft/lib/site-packages/xformers/checkpoint.py:466)         if torch.__version__ < (2, 1):

File [c:\Users\NAME\miniconda3\envs\voicecraft\lib\site-packages\xformers\checkpoint.py:481](file:///C:/Users/NAME/miniconda3/envs/voicecraft/lib/site-packages/xformers/checkpoint.py:481), in SelectiveCheckpointWrapper()
    [476](file:///C:/Users/NAME/miniconda3/envs/voicecraft/lib/site-packages/xformers/checkpoint.py:476)     # TODO: this should be enabled by default in PyTorch
    [477](file:///C:/Users/NAME/miniconda3/envs/voicecraft/lib/site-packages/xformers/checkpoint.py:477)     torch._dynamo.config._experimental_support_context_fn_in_torch_utils_checkpoint = (
    [478](file:///C:/Users/NAME/miniconda3/envs/voicecraft/lib/site-packages/xformers/checkpoint.py:478)         True
    [479](file:///C:/Users/NAME/miniconda3/envs/voicecraft/lib/site-packages/xformers/checkpoint.py:479)     )
--> [481](file:///C:/Users/NAME/miniconda3/envs/voicecraft/lib/site-packages/xformers/checkpoint.py:481) @torch.compiler.disable
    [482](file:///C:/Users/NAME/miniconda3/envs/voicecraft/lib/site-packages/xformers/checkpoint.py:482) def _get_policy_fn(self, *args, **kwargs):
    [483](file:///C:/Users/NAME/miniconda3/envs/voicecraft/lib/site-packages/xformers/checkpoint.py:483)     if not torch.is_grad_enabled():
    [484](file:///C:/Users/NAME/miniconda3/envs/voicecraft/lib/site-packages/xformers/checkpoint.py:484)         # no need to compute a policy as it won't be used
    [485](file:///C:/Users/NAME/miniconda3/envs/voicecraft/lib/site-packages/xformers/checkpoint.py:485)         return []

AttributeError: module 'torch' has no attribute 'compiler'

Description

Man, it driven me to insanity when almost every stage of inference_tts.ipynb have their own errors. I have tried troubleshooting with my knowledge about Python package, and compatibility issues with the rest. Here is what I have counted:

1.

 from data.tokenizer import (
    AudioTokenizer,
    TextTokenizer,
)

Unclear instruction of where to put the inference_tts.ipynb. I supposed it supposed to be src/audiocraft/audiocraft/inference_tts.ipynb. Also ImportError: attempted relative import beyond top-level package

Adding Absolute Import like this could help prevent this issue but it will raising another issue, which is ModuleNotFoundError: No module named 'AudioTokenizer'

import sys
sys.path.append('C:\\Users\\NAME\\TTS\\src\\audiocraft\\audiocraft\\data')

2.

# # load model, tokenizer, and other necessary files
from models import voicecraft
voicecraft_name="giga830M.pth"
ckpt_fn =f"./pretrained_models/{voicecraft_name}"
encodec_fn = "./pretrained_models/encodec_4cb2048_giga.th"
if not os.path.exists(ckpt_fn):
    os.system(f"wget https://huggingface.co/pyp1/VoiceCraft/resolve/main/{voicecraft_name}\?download\=true")
    os.system(f"mv {voicecraft_name}\?download\=true ./pretrained_models/{voicecraft_name}")
if not os.path.exists(encodec_fn):
    os.system(f"wget https://huggingface.co/pyp1/VoiceCraft/resolve/main/encodec_4cb2048_giga.th")
    os.system(f"mv encodec_4cb2048_giga.th ./pretrained_models/encodec_4cb2048_giga.th")

from models import voicecraft doesn't seem like it is working like it should. Probably same package issue with Stage 1's AudioTokenizer and TextTokenizer.

3. Which is the one you see on Error Code. It is... ridiculous. The thing with AttributeError: module 'torch' has no attribute 'compiler' usually caused by torch version that does not support compiler, which is PyTorch 2.0 >.

But hell, transformer of mine is 4.38.1, xformers is 0.0.25.post1, and my torch is 2.2.2+cu121. Which supposedly should able to have compiler. There may be other causes and well, I don't have any ideas.

4. Minor one, what is the thing with apt-get install ffmpeg and apt-get install espeak-ng? It doesn't recognized as any in my system. I think it supposed to be Linux command?

Post-script

You may find this entire issue look like a rant, but no, I didn't want to mean like that. Sure it's a little bit hideous when all of this happened. But all thing considered, it is an amazing project that could probably join in the current stance between Coqui and Tortoise especially the zero-shot part. It will be much popular that it was if someone eventually got this to hook up on their webui, like rsxdalv's TTS Generation Webui.

Of course I would say we still need the fixing. You can ask me for more context or information if you wanted it to fix this.

Update 1 : fixing my wording

ajayarora1235 commented 3 months ago

relax bro. make sure to setup your environment properly according to what's in the readme (torch 0.2.1, that particular version of audiocraft, etc.) this is because of how you installed audiocraft. make sure to install the correct version of it and especially your version of xformers. install xformers 0.22.0 and this fixes it.

jasonppy commented 3 months ago

Checkout quick start with docker, should works for windows https://github.com/jasonppy/VoiceCraft?tab=readme-ov-file#quickstart

Sewlell commented 3 months ago

An update here after I try Docker @jasonppy provided.

Yeah I think I am getting exaggerating that one after hours of troubleshooting and it just failed.

Let talk about the Docker first. It look pretty complete for Docker setup. Except the Issue 1 and 2 still remained during the process. So I switch back to the conventional one, the one without using Docker.

I managed to fix Issue 1 ( wrong folder order ), Issue 2 ( wrong order of environment setup ) and Issue 4 ( basically didn't support in Windows ) in my conventional setup.

But now, new issue, 2 actually.

  1. source command didn't supposed to be in Windows You can pretty much get away with wget and I believe it is pretty much replicable with other command that supported Windows.

  2. Issue 3 in-depth I recheck the package list with conda list for @ajayarora1235 's comment. Since I actually kinda reinstalled them with the correct order, so now my torch is 2.0.1 and transformers is 4.38.2.

The issue is, xformers remained 0.0.25.post1, not 0.0.22 according to the environment.yml.

I have already attempt wiping 0.0.25.post1 xformers with conda remove and pip uninstall, but it always directed to the 0.0.22 one and I had no idea. I think Python use 0.0.25.post1 xformers somehow for now the new Cell 7 and it will show up the error of AttributeError: module 'torch' has no attribute 'compiler'.

possibly the conda list make up the list for both base and voicecraft environment? So yeah so far I am still trying to pull out the 0.0.25.post1 xformers to make Cell 7 worked.

****FOR THOSE WHO WANT TO KNOW THE SOLUTION FOR ISSUE 1, ISSUE 2 AND ISSUE 4***

Issue 1 Your folder must look like this composition in order to make inference_tts.ipynb detecting data folder when you run Cell 5 and other stuff

whatever thing you got your thing of
     I
     I - VoiceCraft
             I - data
             I - demo
             I - pretrained_models
             I - src ------------------------| - audiocraft
             I - z_scripts                         |  (things like audiocraft.egg-info)
             I - inference_tts.ipynb

first VoiceCraft folder is the folder you git clone https://github.com/jasonppy/VoiceCraft.git of

Issue 2

that thing, along with Cell 5's stuff ( below ), caused by installation issue. So once you fix your installation it should recognize words like data and model in VSCode.

 from data.tokenizer import (
    AudioTokenizer,
    TextTokenizer,
)

My investigation find that pip install -e git+https://github.com/facebookresearch/audiocraft.git@c5157b5bf14bf83449c17ea1eeb66c19fb4bc7f0#egg=audiocraft stuff will download 2.2.2 torch and latest version of pretty much everything else with torch. So what you do is, run that command first, then you run pip install torch==2.0.1, then winget install ffmpeg and then so on.

Issue 4

apt-get is an inexistant command for Windows, so you may need to find alternative for it. Fortunately for me and y'all who watching this, installation of ffmpeg can get get away with winget install ffmpeg, which is an alternative for Windows. And also installation of espeak-ng with downloading their latest release in (https://github.com/espeak-ng/espeak-ng) and adding PATH for espeak-ng manually (https://github.com/bootphon/phonemizer/issues/44#issuecomment-1196564549).

Man this thing is getting me nightmares.

Sewlell commented 3 months ago

The hell... I just casually move on to other fork that got released recently.

When I casually pip install xformers==0.0.20 ( NOT 0.0.22 I mentioned earlier ) in Anaconda Prompt itself, not VSCode's end. It's able to uninstall 0.0.25.post1 xformers and overcome AttributeError: module 'torch' has no attribute 'compiler' in Cell 7.

Another error will be given instead : AttributeError: module 'os' has no attribute 'uname'

I look into the web and cluster.py and find that, os.uname doesn't support Windows. And thank to some source I import platform the cluster.py and change uname = os.uname to uname = platform.uname.

And this error show up after this modification:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[2], [line 25](vscode-notebook-cell:?execution_count=2&line=25)
     [22](vscode-notebook-cell:?execution_count=2&line=22) phn2num = ckpt['phn2num']
     [24](vscode-notebook-cell:?execution_count=2&line=24) text_tokenizer = TextTokenizer(backend="espeak")
---> [25](vscode-notebook-cell:?execution_count=2&line=25) audio_tokenizer = AudioTokenizer(signature=encodec_fn, device=device)

File [c:\Users\PEY3C\TTS\VoiceCraft\data\tokenizer.py:110](file:///C:/Users/PEY3C/TTS/VoiceCraft/data/tokenizer.py:110), in AudioTokenizer.__init__(self, device, signature)
    [104](file:///C:/Users/PEY3C/TTS/VoiceCraft/data/tokenizer.py:104) def __init__(
    [105](file:///C:/Users/PEY3C/TTS/VoiceCraft/data/tokenizer.py:105)     self,
    [106](file:///C:/Users/PEY3C/TTS/VoiceCraft/data/tokenizer.py:106)     device: Any = None,
    [107](file:///C:/Users/PEY3C/TTS/VoiceCraft/data/tokenizer.py:107)     signature = None
    [108](file:///C:/Users/PEY3C/TTS/VoiceCraft/data/tokenizer.py:108) ) -> None:
    [109](file:///C:/Users/PEY3C/TTS/VoiceCraft/data/tokenizer.py:109)     from audiocraft.solvers import CompressionSolver
--> [110](file:///C:/Users/PEY3C/TTS/VoiceCraft/data/tokenizer.py:110)     model = CompressionSolver.model_from_checkpoint(signature)
    [111](file:///C:/Users/PEY3C/TTS/VoiceCraft/data/tokenizer.py:111)     self.sample_rate = model.sample_rate
    [112](file:///C:/Users/PEY3C/TTS/VoiceCraft/data/tokenizer.py:112)     self.channels = model.channels

File [c:\users\pey3c\tts\voicecraft\src\audiocraft\audiocraft\solvers\compression.py:287](file:///C:/users/pey3c/tts/voicecraft/src/audiocraft/audiocraft/solvers/compression.py:287), in CompressionSolver.model_from_checkpoint(checkpoint_path, device)
    [285](file:///C:/users/pey3c/tts/voicecraft/src/audiocraft/audiocraft/solvers/compression.py:285) logger = logging.getLogger(__name__)
    [286](file:///C:/users/pey3c/tts/voicecraft/src/audiocraft/audiocraft/solvers/compression.py:286) logger.info(f"Loading compression model from checkpoint: {checkpoint_path}")
--> [287](file:///C:/users/pey3c/tts/voicecraft/src/audiocraft/audiocraft/solvers/compression.py:287) _checkpoint_path = checkpoint.resolve_checkpoint_path(checkpoint_path, use_fsdp=False)
    [288](file:///C:/users/pey3c/tts/voicecraft/src/audiocraft/audiocraft/solvers/compression.py:288) assert _checkpoint_path is not None, f"Could not resolve compression model checkpoint path: {checkpoint_path}"
    [289](file:///C:/users/pey3c/tts/voicecraft/src/audiocraft/audiocraft/solvers/compression.py:289) state = checkpoint.load_checkpoint(_checkpoint_path)

File [c:\users\pey3c\tts\voicecraft\src\audiocraft\audiocraft\utils\checkpoint.py:68](file:///C:/users/pey3c/tts/voicecraft/src/audiocraft/audiocraft/utils/checkpoint.py:68), in resolve_checkpoint_path(sig_or_path, name, use_fsdp)
     [56](file:///C:/users/pey3c/tts/voicecraft/src/audiocraft/audiocraft/utils/checkpoint.py:56) def resolve_checkpoint_path(sig_or_path: tp.Union[Path, str], name: tp.Optional[str] = None,
     [57](file:///C:/users/pey3c/tts/voicecraft/src/audiocraft/audiocraft/utils/checkpoint.py:57)                             use_fsdp: bool = False) -> tp.Optional[Path]:
     [58](file:///C:/users/pey3c/tts/voicecraft/src/audiocraft/audiocraft/utils/checkpoint.py:58)     """Resolve a given checkpoint path for a provided dora sig or path.
     [59](file:///C:/users/pey3c/tts/voicecraft/src/audiocraft/audiocraft/utils/checkpoint.py:59) 
     [60](file:///C:/users/pey3c/tts/voicecraft/src/audiocraft/audiocraft/utils/checkpoint.py:60)     Args:
   (...)
     [66](file:///C:/users/pey3c/tts/voicecraft/src/audiocraft/audiocraft/utils/checkpoint.py:66)         Path, optional: Resolved checkpoint path, if it exists.
     [67](file:///C:/users/pey3c/tts/voicecraft/src/audiocraft/audiocraft/utils/checkpoint.py:67)     """
---> [68](file:///C:/users/pey3c/tts/voicecraft/src/audiocraft/audiocraft/utils/checkpoint.py:68)     from audiocraft import train
     [69](file:///C:/users/pey3c/tts/voicecraft/src/audiocraft/audiocraft/utils/checkpoint.py:69)     xps_root = train.main.dora.dir / 'xps'
     [70](file:///C:/users/pey3c/tts/voicecraft/src/audiocraft/audiocraft/utils/checkpoint.py:70)     sig_or_path = str(sig_or_path)

File [c:\users\pey3c\tts\voicecraft\src\audiocraft\audiocraft\train.py:149](file:///C:/users/pey3c/tts/voicecraft/src/audiocraft/audiocraft/train.py:149)
    [144](file:///C:/users/pey3c/tts/voicecraft/src/audiocraft/audiocraft/train.py:144)         return
    [146](file:///C:/users/pey3c/tts/voicecraft/src/audiocraft/audiocraft/train.py:146)     return solver.run()
--> [149](file:///C:/users/pey3c/tts/voicecraft/src/audiocraft/audiocraft/train.py:149) main.dora.dir = AudioCraftEnvironment.get_dora_dir()
    [150](file:///C:/users/pey3c/tts/voicecraft/src/audiocraft/audiocraft/train.py:150) main._base_cfg.slurm = get_slurm_parameters(main._base_cfg.slurm)
    [152](file:///C:/users/pey3c/tts/voicecraft/src/audiocraft/audiocraft/train.py:152) if main.dora.shared is not None and not os.access(main.dora.shared, os.R_OK):

File [c:\users\pey3c\tts\voicecraft\src\audiocraft\audiocraft\environment.py:108](file:///C:/users/pey3c/tts/voicecraft/src/audiocraft/audiocraft/environment.py:108), in AudioCraftEnvironment.get_dora_dir(cls)
    [103](file:///C:/users/pey3c/tts/voicecraft/src/audiocraft/audiocraft/environment.py:103) @classmethod
    [104](file:///C:/users/pey3c/tts/voicecraft/src/audiocraft/audiocraft/environment.py:104) def get_dora_dir(cls) -> Path:
    [105](file:///C:/users/pey3c/tts/voicecraft/src/audiocraft/audiocraft/environment.py:105)     """Gets the path to the dora directory for the current team and cluster.
    [106](file:///C:/users/pey3c/tts/voicecraft/src/audiocraft/audiocraft/environment.py:106)     Value is overridden by the AUDIOCRAFT_DORA_DIR env var.
    [107](file:///C:/users/pey3c/tts/voicecraft/src/audiocraft/audiocraft/environment.py:107)     """
--> [108](file:///C:/users/pey3c/tts/voicecraft/src/audiocraft/audiocraft/environment.py:108)     cluster_config = cls.instance()._get_cluster_config()
    [109](file:///C:/users/pey3c/tts/voicecraft/src/audiocraft/audiocraft/environment.py:109)     dora_dir = os.getenv("AUDIOCRAFT_DORA_DIR", cluster_config["dora_dir"])
    [110](file:///C:/users/pey3c/tts/voicecraft/src/audiocraft/audiocraft/environment.py:110)     logger.warning(f"Dora directory: {dora_dir}")

File [c:\users\pey3c\tts\voicecraft\src\audiocraft\audiocraft\environment.py:81](file:///C:/users/pey3c/tts/voicecraft/src/audiocraft/audiocraft/environment.py:81), in AudioCraftEnvironment.instance(cls)
     [78](file:///C:/users/pey3c/tts/voicecraft/src/audiocraft/audiocraft/environment.py:78) @classmethod
     [79](file:///C:/users/pey3c/tts/voicecraft/src/audiocraft/audiocraft/environment.py:79) def instance(cls):
     [80](file:///C:/users/pey3c/tts/voicecraft/src/audiocraft/audiocraft/environment.py:80)     if cls._instance is None:
---> [81](file:///C:/users/pey3c/tts/voicecraft/src/audiocraft/audiocraft/environment.py:81)         cls._instance = cls()
     [82](file:///C:/users/pey3c/tts/voicecraft/src/audiocraft/audiocraft/environment.py:82)     return cls._instance

File [c:\users\pey3c\tts\voicecraft\src\audiocraft\audiocraft\environment.py:52](file:///C:/users/pey3c/tts/voicecraft/src/audiocraft/audiocraft/environment.py:52), in AudioCraftEnvironment.__init__(self)
     [50](file:///C:/users/pey3c/tts/voicecraft/src/audiocraft/audiocraft/environment.py:50) """Loads configuration."""
     [51](file:///C:/users/pey3c/tts/voicecraft/src/audiocraft/audiocraft/environment.py:51) self.team: str = os.getenv("AUDIOCRAFT_TEAM", self.DEFAULT_TEAM)
---> [52](file:///C:/users/pey3c/tts/voicecraft/src/audiocraft/audiocraft/environment.py:52) cluster_type = _guess_cluster_type()
     [53](file:///C:/users/pey3c/tts/voicecraft/src/audiocraft/audiocraft/environment.py:53) cluster = os.getenv(
     [54](file:///C:/users/pey3c/tts/voicecraft/src/audiocraft/audiocraft/environment.py:54)     "AUDIOCRAFT_CLUSTER", cluster_type.value
     [55](file:///C:/users/pey3c/tts/voicecraft/src/audiocraft/audiocraft/environment.py:55) )
     [56](file:///C:/users/pey3c/tts/voicecraft/src/audiocraft/audiocraft/environment.py:56) logger.info("Detecting cluster type %s", cluster_type)

File [c:\users\pey3c\tts\voicecraft\src\audiocraft\audiocraft\utils\cluster.py:31](file:///C:/users/pey3c/tts/voicecraft/src/audiocraft/audiocraft/utils/cluster.py:31), in _guess_cluster_type()
     [29](file:///C:/users/pey3c/tts/voicecraft/src/audiocraft/audiocraft/utils/cluster.py:29) uname = platform.uname()
     [30](file:///C:/users/pey3c/tts/voicecraft/src/audiocraft/audiocraft/utils/cluster.py:30) fqdn = socket.getfqdn()
---> [31](file:///C:/users/pey3c/tts/voicecraft/src/audiocraft/audiocraft/utils/cluster.py:31) if uname.sysname == "Linux" and (uname.release.endswith("-aws") or ".ec2" in fqdn):
     [32](file:///C:/users/pey3c/tts/voicecraft/src/audiocraft/audiocraft/utils/cluster.py:32)     return ClusterType.AWS
     [34](file:///C:/users/pey3c/tts/voicecraft/src/audiocraft/audiocraft/utils/cluster.py:34) if fqdn.endswith(".fair"):

AttributeError: 'uname_result' object has no attribute 'sysname'

So, OK.... I think the lore thickened. Probably make me feel like this is more certainly designed for Linux than Windows.

jasonppy commented 3 months ago

Thanks for your efforts, I'm unable to test issues regarding windows, but the docker solution seems to work for some people. Thanks for the feedback on audiocraft installation, I have made changes in https://github.com/jasonppy/VoiceCraft/commit/991b1fe3bb622698b15223df5d91eea33d79d2b9

Sewlell commented 3 months ago

That's make sense. I got to solve AttributeError: module 'os' has no attribute 'uname' with new .ipynb and my composition.

But I got this error in the same Cell 7.

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[4], [line 25](vscode-notebook-cell:?execution_count=4&line=25)
     [22](vscode-notebook-cell:?execution_count=4&line=22) phn2num = ckpt['phn2num']
     [24](vscode-notebook-cell:?execution_count=4&line=24) text_tokenizer = TextTokenizer(backend="espeak")
---> [25](vscode-notebook-cell:?execution_count=4&line=25) audio_tokenizer = AudioTokenizer(signature=encodec_fn, device=device) # will also put the neural codec model on gpu

File [c:\Users\PEY3C\TTS\VoiceCraft\data\tokenizer.py:109](file:///C:/Users/PEY3C/TTS/VoiceCraft/data/tokenizer.py:109), in AudioTokenizer.__init__(self, device, signature)
    [104](file:///C:/Users/PEY3C/TTS/VoiceCraft/data/tokenizer.py:104) def __init__(
    [105](file:///C:/Users/PEY3C/TTS/VoiceCraft/data/tokenizer.py:105)     self,
    [106](file:///C:/Users/PEY3C/TTS/VoiceCraft/data/tokenizer.py:106)     device: Any = None,
    [107](file:///C:/Users/PEY3C/TTS/VoiceCraft/data/tokenizer.py:107)     signature = None
    [108](file:///C:/Users/PEY3C/TTS/VoiceCraft/data/tokenizer.py:108) ) -> None:
--> [109](file:///C:/Users/PEY3C/TTS/VoiceCraft/data/tokenizer.py:109)     from audiocraft.solvers import CompressionSolver
    [110](file:///C:/Users/PEY3C/TTS/VoiceCraft/data/tokenizer.py:110)     model = CompressionSolver.model_from_checkpoint(signature)
    [111](file:///C:/Users/PEY3C/TTS/VoiceCraft/data/tokenizer.py:111)     self.sample_rate = model.sample_rate

ModuleNotFoundError: No module named 'audiocraft'

Final Cell when you are going to generate

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[11], [line 29](vscode-notebook-cell:?execution_count=11&line=29)
     [27](vscode-notebook-cell:?execution_count=11&line=27) decode_config = {'top_k': top_k, 'top_p': top_p, 'temperature': temperature, 'stop_repetition': stop_repetition, 'kvcache': kvcache, "codec_audio_sr": codec_audio_sr, "codec_sr": codec_sr, "silence_tokens": silence_tokens, "sample_batch_size": sample_batch_size}
     [28](vscode-notebook-cell:?execution_count=11&line=28) from inference_tts_scale import inference_one_sample
---> [29](vscode-notebook-cell:?execution_count=11&line=29) concated_audio, gen_audio = inference_one_sample(model, ckpt["config"], phn2num, text_tokenizer, audio_tokenizer, audio_fn, target_transcript, device, decode_config, prompt_end_frame)
     [31](vscode-notebook-cell:?execution_count=11&line=31) # save segments for comparison
     [32](vscode-notebook-cell:?execution_count=11&line=32) concated_audio, gen_audio = concated_audio[0].cpu(), gen_audio[0].cpu()

NameError: name 'audio_tokenizer' is not defined

The rest is fine so far with sample .wav.

I didn't have time to test more in depth now because school. But yeah I will still assist.

Sewlell commented 3 months ago

ModuleNotFoundError: No module named 'audiocraft' happened because I didn't installed the audiocraft lol. And it worked all the way down to generation now.

I am still looking on generation speed and issue stuff. But somehow I realized the inference_tts.ipynb didn't do the TTS thing, but do the speech editing.

Will elaborate more in another new issue once I done playing and collecting more information/issues with this and other forks.