torch / torch7

http://torch.ch
Other
8.96k stars 2.38k forks source link

discovered an issue for randperm on file THTensorMath.c lines 1930 #953

Open allskyee opened 7 years ago

allskyee commented 7 years ago

Discovered when running OpenFace.

Line in question is the following rdata[i*rstride_0] = (real)(i);

Given explanation in https://github.com/torch/torch7/blob/master/doc/maths.md#res-torchrandpermres-gen-n

isn't this supposed to be the following? rdata[i*rstride_0] = (real)(i + 1);

This fixed the problem for me. Perhaps others don't see it because they're running on GPU.

I am running on CPU (at least on the machine I'm writing this post from).

soumith commented 7 years ago

What is the weird behavior you are seeing? I just checked randperm and seems working fine:

th> torch.randperm(10)
  5
  2
  6
 10
  8
  3
  7
  1
  9
  4
[torch.DoubleTensor of size 10]
allskyee commented 7 years ago

Odd... because when I invoke from a script, the range seems to be [0, n)

I compiled it to run on CPU gather than GPU. That shouldn't make a difference should it?

Looking at the code itself though, https://github.com/torch/torch7/blob/master/lib/TH/generic/THTensorMath.c lines 1936~1945, it makes sense that I'm getting ranges [0, n) and not (0, n]

1936 for(i = 0; i < n; i++) 1937 rdata[i*rstride_0] = (real)(i);

soumith commented 7 years ago

ok, with python, this is likely a bug of not adding TH_INDEX_BASE. I'll fix it, thanks.

soumith commented 7 years ago

wait, if you use OpenFace, you are using LuaTorch, though you might be calling it from python. Can you give me a sample invocation with the output? I'm finding it hard to see what's happening on your side.

allskyee commented 7 years ago

Oh my bad, it was the lua script.

This file to be precise https://github.com/cmusatyalab/openface/blob/master/batch-represent/dataset.lua line 257.

     local perm = torch.randperm(count)

Printing the local variable perm gives me indices [0, count). Because of this, there is an invalid array index error on line 269

           self.classListTest[i][idx] = list[perm[j]]
allskyee commented 7 years ago

I found that this only happens when randperm is preceded by torch.setdefaulttensortype('torch.FloatTensor')

So the following is my output,

th> torch.setdefaulttensortype('torch.FloatTensor') [0.0001s] th> print (torch.randperm(5)) 3 0 1 4 2 [torch.FloatTensor of size 5]

                                                                  [0.0003s] 
pavanky commented 7 years ago

Also reported here: https://github.com/torch/torch7/issues/966

allskyee commented 7 years ago

I now see that in pkg/torch/build/TensorMath.c and looking into file torch_FloatTensor_randperm that you add 1 at the end. So I can see that not adding 1 in the randperm's C++ function would be correct.

Thanks pavanky for the link. Yeah it seems to be the same issue.

adamlerer commented 7 years ago

Probably fixed by https://github.com/torch/torch7/pull/963.