lucidrains / spear-tts-pytorch

Implementation of Spear-TTS - multi-speaker text-to-speech attention network, in Pytorch
MIT License
249 stars 18 forks source link

use cuda to generate audio - text data set #11

Closed KonstantineGoudz closed 10 months ago

KonstantineGoudz commented 10 months ago

Resolved an issue when attempting to use cuda to generate text audio data sets

this will not cause any issues now

from audiolm_pytorch import HubertWithKmeans, data
from spear_tts_pytorch import (
    TextToSemantic,
    SemanticToTextDatasetGenerator,
)

ds = data.SoundDataset(
    folder="path-to-audio-files",
    target_sample_hz=44100
    )

output_folder = "generated_audio_text_dataset"

wav2vec = HubertWithKmeans(
    checkpoint_path = './hubert_base_ls960.pt',
    kmeans_path = './hubert_base_ls960_L9_km500.bin'
)

model = TextToSemantic(
    wav2vec = wav2vec,
    dim = 512,
    heads = 8,
    use_openai_tokenizer=True,
    target_kv_heads = 2, # grouped query attention, for memory efficient decoding
    source_depth = 1,
    target_depth = 1,
)
model.cuda()

ds_generator = SemanticToTextDatasetGenerator(
    model=model,
    dataset=ds,
    folder=output_folder,
)

ds_generator.forward()
lucidrains commented 10 months ago

@KonstantineGoudz thank you!