lucidrains / audiolm-pytorch

Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch
MIT License
2.39k stars 255 forks source link

Question about the generate #192

Open asr-pub opened 1 year ago

asr-pub commented 1 year ago

Hello,I trained a semantic tokens -> acoustic tokens(3 codes) model,and I want to use the argmax to make every inference the same.

if argmax:
    print("use argmax")
    sampled = torch.argmax(last_coarse_logits, dim = -1)
else:
    print("not use argmax")
    filtered_logits = top_k(last_coarse_logits, thres = filter_thres)
    sampled = gumbel_sample(filtered_logits, temperature = temperature, dim = -1)

However,when in the argmax mode,the semantic tokens -> acoustic tokens(3 codes) -> wav,and the wav has no speech,with long silence,do u know Why? image