sandialabs / pyttb

Python Tensor Toolbox
https://pyttb.readthedocs.io
BSD 2-Clause "Simplified" License
25 stars 13 forks source link

`sptensor.logical_and` does not have expected behavior #256

Closed dmdunla closed 1 year ago

dmdunla commented 1 year ago

sptensor.logical_and behavior is different than in MATLAB, and I am not sure that what the MATLAB version is doing is useful/correct.

Questions: Do we want to match the MATLAB behavior? Do we want to follow numpy/scipy.sparse? If so, what are those behaviors.

Here is the current sptensor.logical_and:

Create a sparse tensor:

>>> S = ttb.sptensor()
>>> S[0,0] = 1; S[1,1] = 2
>>> S
sparse tensor of shape (2, 2) with 2 nonzeros
[0, 0] = 1.0
[1, 1] = 2.0

Compute logical AND with dense tensor that has the same non-zero
pattern but different values:

>>> T = S.to_tensor()
>>> T[0,0] = T[0,0] + 1
>>> S.logical_and(T)
sparse tensor of shape (2, 2) with 2 nonzeros
[0, 0] = 1.0
[1, 1] = 1.0

Compute logical AND with scalar value:

>>> S.logical_and(1.0)
sparse tensor of shape (2, 2) with 2 nonzeros
[0, 0] = True
[1, 1] = False

The output when computing with a tensor is an indicator sparse tensor, whereas it is a boolean sparse tensor for the scalar case. Also, the scalar case is computing the AND as an equality comparison it appears, not just a check for non-zero.

Here is the output from MATLAB:

>> S = sptensor()
S is an all-zero sparse tensor of size [empty tensor]
>> S(1,1) = 1
S is a sparse tensor of size 1 x 1 with 1 nonzeros
    (1,1)     1
>> S(2,2) = 2
S is a sparse tensor of size 2 x 2 with 2 nonzeros
    (1,1)     1
    (2,2)     2
>> T = tensor(S)
T is a tensor of size 2 x 2
    T(:,:) = 
         1     0
         0     2
>> T(1,1) = 2
T is a tensor of size 2 x 2
    T(:,:) = 
         2     0
         0     2
>> and(S,tensor(S))
ans is a sparse tensor of size 2 x 2 with 2 nonzeros
    (1,1)   1
    (2,2)   1
>> and(S,1.0)
ans is a sparse tensor of size 2 x 2 with 2 nonzeros
    (1,1)     1
    (2,2)     1

The scalar AND is odd in MATLAB, as it checks for either zero or non-zero depending on the scalar input:

>> and(S,0)
ans is an all-zero sparse tensor of size 2 x 2
>> and(S,1)
ans is a sparse tensor of size 2 x 2 with 2 nonzeros
    (1,1)     1
    (2,2)     1
>> and(S,2)
ans is a sparse tensor of size 2 x 2 with 2 nonzeros
    (1,1)     1
    (2,2)     1

The documentation in pyttb should document this behavior (once it matches with MATLAB).

Here are some experiments with numpy and scipy, where I am unable to compute __and__ at all, so I am unsure what the default behavior for those should be:

>>> import numpy as np
>>> A = np.zeros((2,2))
>>> A[0,0] = 1
>>> A & 0
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: ufunc 'bitwise_and' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
>>> 0 & A
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: ufunc 'bitwise_and' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

>>> import scipy.sparse as sp 
>>> B = sp.coo_matrix(A)
>>> B
<2x2 sparse matrix of type '<class 'numpy.float64'>'
        with 1 stored elements in COOrdinate format>
>>> B & 1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for &: 'coo_matrix' and 'int'
>>> B & A
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for &: 'coo_matrix' and 'float'
>>> B.toarray()
array([[1., 0.],
       [0., 0.]])
>>> B.toarray() & A
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: ufunc 'bitwise_and' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
ntjohnson1 commented 1 year ago

We named our arguments to mirror numpy's naming convention https://numpy.org/doc/stable/reference/generated/numpy.logical_and.html I don't know offhand if we match their behavior

dmdunla commented 1 year ago

OK, now I see the numpy examples. The & operator is not the same, as it is bitwise AND, and it should be used with boolean (and maybe int) arrays. Here's is how the numpy.logical_and works (note that it is a function of two (array-like) inputs, not a class method (e.g., for sptensor).

>>> A = np.zeros((2,2))
>>> A[0,0] = 1
>>> np.logical_and(A,A)
array([[ True, False],
       [False, False]])
>>> np.logical_and(A,1)
array([[ True, False],
       [False, False]])
>>> np.logical_and(A,2)
array([[ True, False],
       [False, False]])

I do not see anything equivalent in scipy but other math libraries (e.g., tensorflow, cupy) do have logical_and that behave in this way.

Unfortunately, we do not support boolean tensors in pyttb, only float values at this point.

ntjohnson1 commented 1 year ago

Ok. So for consistency we want the result to be a sptensor of the same dtype but with explicit values only equal to 1.0 (for logical_or same dtype argument but might end up a dense tensor). Or did you have something else in mind?

dmdunla commented 1 year ago

I was trying to say that we do not currently support dytpe throughout our classes. Some of the support comes from the fact that we use numpy for a lot of the data members in our classes. So, we can change some of the data to boolean values after construction (as shown in the example above).

So, my suggestion is that we follow MATLAB TTB for now and make the outputs of the logical_* methods be indicator tensors (with 1.0 for True and 0 or nothing [i.e., for sptensor] for False).

ntjohnson1 commented 1 year ago

Great. We're saying the same thing in different ways so looks like we're on the same page.