Computing p-value in non differentiable statistics

dkoutsou commented 6 years ago

Hi,

I am trying to use the non-differentiable statistics to compute the p-values. My code is as follows:

sample_1 = Variable(torch.FloatTensor(sample_1))
sample_2 = Variable(torch.FloatTensor(sample_2))
statistic = FRStatistic(sample_1, sample_2)
_, mst = statistic.__call__(sample_1, sample_2, ret_matrix=True)
pvalue = statistic.pval(mst)

sample_1, sample_2 are pytorch variables of size [100x22]. However I have the following two errors (they are exactly the same regardless of which statistic I want to use):

Traceback (most recent call last):
  File "code/compare.py", line 147, in <module>
    compare.KnnStatistic(original_data, produced_data, args.cardinality, 100)
  File "code/compare.py", line 102, in KnnStatistic
    _, mst = statistic.__call__(sample_1, sample_2, ret_matrix=True)
  File "/Users/dimkou/Documents/deep_learning/deep/lib/python3.6/site-packages/torch_two_sample-0.1-py3.6-macosx-10.13-x86_64.egg/torch_two_sample/statistics_nondiff.py", line 186, in __call__
    assert n_1 == self.n_1 and n_2 == self.n_2
  File "/Users/dimkou/Documents/deep_learning/deep/lib/python3.6/site-packages/torch/autograd/variable.py", line 125, in __bool__
    torch.typename(self.data) + " is ambiguous")
RuntimeError: bool value of Variable objects containing non-empty torch.ByteTensor is ambiguous

Uncommenting that line from the source code and recompiling the package solves that error.

The second one is:

Traceback (most recent call last):
  File "code/compare.py", line 133, in <module>
    compare.FRStatistic(original_data, produced_data, args.cardinality, 100)
  File "code/compare.py", line 85, in FRStatistic
    pvalue = statistic.pval(mst)
  File "/Users/dimkou/Documents/deep_learning/deep/lib/python3.6/site-packages/torch_two_sample-0.1-py3.6-macosx-10.13-x86_64.egg/torch_two_sample/statistics_nondiff.py", line 141, in pval
    self.n_1, self.n_2, n_permutations)
  File "torch_two_sample/permutation_test.pyx", line 57, in torch_two_sample.permutation_test.permutation_test_mat
  File "/Users/dimkou/Documents/deep_learning/deep/lib/python3.6/site-packages/torch/autograd/variable.py", line 130, in __int__
    return int(self.data)
  File "/Users/dimkou/Documents/deep_learning/deep/lib/python3.6/site-packages/torch/tensor.py", line 389, in __int__
    raise TypeError("only 1-element tensors can be converted "
TypeError: only 1-element tensors can be converted to Python scalars

The return type of mst is a FloatTensor of size [200x200]. Am I doing something wrong in the utilization of the package? How can I fix the second error?

Edit: Ok, the first fix was the problem for the second one. Setting:

self.n_1 = sample_1.size(0)
self.n_2 = sample_2.size(0)

in both statistics seems to fix the errors without messing up the functionality of the code. Shall I submit a PR?

calincru commented 6 years ago

But the code is OK as it is as long as you pass the sizes of the two samples on construction, instead of the samples themselves, right? And I think this is well documented in the constructors.

dkoutsou commented 6 years ago

My bad, didn't pay attention to that and I was misguided by the error.

josipd / torch-two-sample

Computing p-value in non differentiable statistics #3