torch / torch7

http://torch.ch
Other
8.97k stars 2.38k forks source link

#434 breaks multinomial. #453

Open nkoumchatzky opened 8 years ago

nkoumchatzky commented 8 years ago

Hey gents! I believe #434 is buggy. @andresy @albanD @soumith If the sum of probs is a double but the cumulative distribution array is a float array, then you can end up with the last element of the cumulative distribution being < 1 after dividing each of these elements by the sum (double to float conversion followed by float by double division). Here's a simple C main that simulates why this patch can break torch.multinomial in some cases:

#include <stdio.h>

int main()
{
    printf("TEST MULTINOMIAL\n");
    double maxUniform32 = 4294967295L * (1.0/4294967296.0);
    // Take a number bigger than the largest float32 consecutive integer and
    // assume it's the multinomial accreal sum of unnormalized probabilities.
    double sum = 50777218; 
    // Cast it to float as is the case when copying sum to the last value cum_dist (last value of the Tensor)
    float fsample = 50777218; 
    // Print rounded float (the new last element) -- should be 50777216.000000000000000
    printf("fsample: %15.15f\n",fsample);
    // Divide rounded float by exact sum
    fsample /= sum;
    printf("accreal sum of probs: %15.15f\n",sum);
    printf("Last element of the cumulative distribution array: %15.15f\n",fsample);
    printf("Maximum uniform sample in Torch (32 bits): %15.15f\n",maxUniform32);
    printf("Test if last element of cumulative dist array is lower than a possible uniform sample...\n");
    if(maxUniform32 > fsample)
    {
        printf("ERROR: The binary search will yield an out of bounds index!!!!!!!!!");
    }
    return 0;
}

So either we can change cum_dist to a double tensor, either we can revert it. The former seems to suit everyone and might avoid further problems. Other possibility is to force the last element to 1.

nkoumchatzky commented 8 years ago

Hey, I have a patch that doublifies all that's cumulative dist-related. Let me know if that works or if we chose another path.

albanD commented 8 years ago

@nkoumchatzky I did not think this could be an issue. Changing cum_dist to a DoubleTensor would fix the issue for sure. Do you have an idea of the impact on the performances though? Also I am really puzzled by this condition (not really important since its always true right now).

nkoumchatzky commented 8 years ago

Np! That one was not easy to guess. No idea about the perf though. In my setup, I use multinomial as part of a bigger system and it counts for nothing in terms of computation time. And yeah, this condition is crazy! I guess this was back then when maybe the probs had to sum to 1.

soumith commented 8 years ago

@albanD @nkoumchatzky is there an easy quick fix to this? looks like peeps downstream in char-rnn see the issue bubbling up.

nkoumchatzky commented 8 years ago

@soumith Looking at it. In the meantime it would be great to have a self-contained example to test on from the char-rnn guys, I'll ask them.

nkoumchatzky commented 8 years ago

So to make this issue more clear, here is the sequence of events: 1) @andresy noticed a bad sampling of rare events in some configurations where probs are floats in multinomial, fixed by #434. 2) The above patch introduced a regression referenced in that issue, which made the binary search crash in some configurations. This is maybe what causes https://github.com/karpathy/char-rnn/issues/28 (waiting for a self-contained example).

There are three ways to fix it: 1) switch to all doubles (i.e. the cumulative distribution + intermediate states in the multinomial function will be DoubleTensors or doubles) 2) make sure the last bucket is set to 1 in any case 3) both

1) gives double precision but does not guarantee it won't fail in very specific cases, 2) ensures it's gonna work but won't give double precision, 3) gives both in any case, wether probs are a FloatTensor or a DoubleTensor.

I would choose 3) since I didn't see any noticeable performance degradation and should hopefully close this issue once and for all (if that fixes the char-rnn problem). I'll add a PR, let me know if that behavior make sense.

soumith commented 8 years ago

(3) sounds good to me.

nkoumchatzky commented 8 years ago

Cool @soumith! Added the patch at #593.