Hi, I saw in the generate.py, you guys used below code to do gumbel sampling
def multinomial_sample_one_no_sync(probs_sort): # Does multinomial sampling without a cuda synchronization
q = torch.empty_like(probssort).exponential(1)
return torch.argmax(probs_sort / q, dim=-1, keepdim=True).to(dtype=torch.int)
Are you sure that empty_like behaves the same as uniform random variable? I mean, it seems like it just take what ever value is in the stack at the moment, and can very likely follow some pattern and not behave pseudorandomly. Would this affect the sampling and cause bias?
Edit: nvm i was stupid.
Hi, I saw in the generate.py, you guys used below code to do gumbel sampling
def multinomial_sample_one_no_sync(probs_sort): # Does multinomial sampling without a cuda synchronization q = torch.empty_like(probssort).exponential(1) return torch.argmax(probs_sort / q, dim=-1, keepdim=True).to(dtype=torch.int)
Are you sure that empty_like behaves the same as uniform random variable? I mean, it seems like it just take what ever value is in the stack at the moment, and can very likely follow some pattern and not behave pseudorandomly. Would this affect the sampling and cause bias?