Open asr-pub opened 1 year ago
Hello,I trained a semantic tokens -> acoustic tokens(3 codes) model,and I want to use the argmax to make every inference the same.
if argmax: print("use argmax") sampled = torch.argmax(last_coarse_logits, dim = -1) else: print("not use argmax") filtered_logits = top_k(last_coarse_logits, thres = filter_thres) sampled = gumbel_sample(filtered_logits, temperature = temperature, dim = -1)
However,when in the argmax mode,the semantic tokens -> acoustic tokens(3 codes) -> wav,and the wav has no speech,with long silence,do u know Why?
Hello,I trained a semantic tokens -> acoustic tokens(3 codes) model,and I want to use the argmax to make every inference the same.
However,when in the argmax mode,the semantic tokens -> acoustic tokens(3 codes) -> wav,and the wav has no speech,with long silence,do u know Why?