Open nkoumchatzky opened 8 years ago
Hey, I have a patch that doublifies all that's cumulative dist-related. Let me know if that works or if we chose another path.
@nkoumchatzky I did not think this could be an issue.
Changing cum_dist
to a DoubleTensor would fix the issue for sure. Do you have an idea of the impact on the performances though?
Also I am really puzzled by this condition (not really important since its always true right now).
Np! That one was not easy to guess. No idea about the perf though. In my setup, I use multinomial as part of a bigger system and it counts for nothing in terms of computation time. And yeah, this condition is crazy! I guess this was back then when maybe the probs had to sum to 1.
@albanD @nkoumchatzky is there an easy quick fix to this? looks like peeps downstream in char-rnn see the issue bubbling up.
@soumith Looking at it. In the meantime it would be great to have a self-contained example to test on from the char-rnn guys, I'll ask them.
So to make this issue more clear, here is the sequence of events: 1) @andresy noticed a bad sampling of rare events in some configurations where probs are floats in multinomial, fixed by #434. 2) The above patch introduced a regression referenced in that issue, which made the binary search crash in some configurations. This is maybe what causes https://github.com/karpathy/char-rnn/issues/28 (waiting for a self-contained example).
There are three ways to fix it: 1) switch to all doubles (i.e. the cumulative distribution + intermediate states in the multinomial function will be DoubleTensors or doubles) 2) make sure the last bucket is set to 1 in any case 3) both
1) gives double precision but does not guarantee it won't fail in very specific cases, 2) ensures it's gonna work but won't give double precision, 3) gives both in any case, wether probs are a FloatTensor or a DoubleTensor.
I would choose 3) since I didn't see any noticeable performance degradation and should hopefully close this issue once and for all (if that fixes the char-rnn problem). I'll add a PR, let me know if that behavior make sense.
(3) sounds good to me.
Cool @soumith! Added the patch at #593.
Hey gents! I believe #434 is buggy. @andresy @albanD @soumith If the sum of probs is a double but the cumulative distribution array is a float array, then you can end up with the last element of the cumulative distribution being < 1 after dividing each of these elements by the sum (double to float conversion followed by float by double division). Here's a simple C main that simulates why this patch can break torch.multinomial in some cases:
So either we can change cum_dist to a double tensor, either we can revert it. The former seems to suit everyone and might avoid further problems. Other possibility is to force the last element to 1.