Open stevenhillis opened 1 year ago
are you using hubert?
Yes, hubert.
from audiolm_pytorch import HubertWithKmeans, SemanticTransformer, SemanticTransformerTrainer
wav2vec = HubertWithKmeans(
checkpoint_path = './hubert/hubert_base_ls960.pt',
kmeans_path = './hubert/hubert_base_ls960_L9_km500.bin'
)
semantic_transformer = SemanticTransformer(
num_semantic_tokens = wav2vec.codebook_size,
dim = 1024,
depth = 6,
# has_condition = True, # this will have to be set to True
# cond_as_self_attn_prefix = True # whether to condition as prefix to self attention, instead of cross attention, as was done in 'VALL-E' paper
).cuda()
trainer = SemanticTransformerTrainer(
transformer = semantic_transformer,
wav2vec = wav2vec,
paths_list_path = '/path/to/training/manifest.txt',
batch_size = 256,
grad_accum_every = 1,
dl_num_workers = 8,
data_max_length_seconds = 2,
num_train_steps = 1_000_000,
force_clear_prev_results = True,
accelerate_kwargs = {'log_with': 'wandb', 'project_dir': "./runs"},
results_folder='./results/semantic/'
)
trainer.train()
@stevenhillis ok, i'm not sure if these fairseq models are compatible with accelerate
i'll try it out this weekend, and i'll also make sure the transformers can accept pre-encoded semantic token ids, if this is an issue
@stevenhillis deepgram is doing generative models now? i thought they were just speech to text?
I chased down the fairseq model idea a little, and I don't think that's it. The pretrained huberts are on huggingface too, and I can get an embed of the right size from theirs with
# hidden_states return object contains embedding output, followed by outputs of each layer
AutoModel.from_pretrained("facebook/hubert-base-ls960").eval()(inputs, output_hidden_states=True).hidden_states[output_layer + 1]
It doesn't match the output of the fairseq implementation (although those outputs are themselves quite nondeterministic), but more importantly, I get the same error accelerating train_semantic.py with the huggingface hubert as with the fairseq.
Moreover, I also get the same error when trying to launch a train_fine.py script with accelerate.
Transcription is still the core product offering, but the market is ready for more! We'll be doing a bunch of work on generative modeling for speech and text this year. Always hiring researchers!
Transcription is still the core product offering, but the market is ready for more! We'll be doing a bunch of work on generative modeling for speech and text this year. Always hiring researchers!
I'm good thanks. ok will look into it later this weekend
oh Pity ! I extracted the token in advance,so I don't need to import the hubert model 。 but But still this problem occurs
when use accelerate launch multi-GPu train SemanticTransformer
do you have a plan to fix this @lucidrains @stevenhillis
I think I solved the problem
the class Attention(nn.Module):
add code
if num_null_kv > 0: self.null_kv = nn.Parameter(torch.randn(2, num_null_kv, dim_head))
When I try to train the semantic transformer with accelerate (
accelerate launch train_semantic.py
, where train_semantic.py is lifted directly from the readme), I getMy accelerate config is okay, since it works for training soundstream model. I also get no errors when I run
python train_semantic.py
for single-gpu training. There's a specific problem with accelerate preparing the SemanticTransformer model.