Normalization in CP algorithms

rusillini commented 3 years ago

Describe the bug

In tensorly 0.5.1 installed from the Anaconda channel, non-negative PARAFAC with normalization returns NaNs as a result when run on GPU using PyTorch 1.8.1 as the backend. Non-negative PARAFAC without normalization flag works fine on GPU.

Steps or Code to Reproduce

import numpy as np
import tensorly as tl
from tensorly.decomposition import non_negative_parafac

tl.set_backend('pytorch')

p1_g = np.array([1,0,2,0], dtype=float)
p1_c = np.array([1,1,0,0], dtype=float)
p2_g = np.array([0,1,2,0], dtype=float)
p2_c = np.array([0,1,1,0], dtype=float)
p3_g = np.array([1,0,0,2], dtype=float)
p3_c = np.array([0,0,1,1], dtype=float)

t = np.outer(p1_c,p1_g)
t = np.append(t,np.outer(p2_c,p2_g),axis=0)
t = np.append(t,np.outer(p3_c,p3_g),axis=0)
t = t.reshape(3,4,4)

t = tl.tensor(t, device='cuda:0')
factors = non_negative_parafac(t, rank=3, normalize_factors=True)
print(factors[1])
factors = non_negative_parafac(t, rank=3)
print(factors[1])

Expected behavior

PARAFAC should give meaningful results with and without the normalization flag when run on GPU with PyTorch as the backend.

Actual result

[tensor([[nan, nan, nan], [nan, nan, nan], [nan, nan, nan]], device='cuda:0'), tensor([[nan, nan, nan], [nan, nan, nan], [nan, nan, nan], [nan, nan, nan]], device='cuda:0'), tensor([[nan, nan, nan], [nan, nan, nan], [nan, nan, nan], [nan, nan, nan]], device='cuda:0')] [tensor([[1.9754e-02, 0.0000e+00, 2.0000e+00], [2.9770e+00, 0.0000e+00, 4.4415e-04], [0.0000e+00, 2.1693e+00, 0.0000e+00]], device='cuda:0'), tensor([[3.7141e-05, 2.0000e-12, 1.1050e+00], [7.6445e-01, 2.0000e-12, 1.0991e+00], [7.6456e-01, 1.0000e+00, 0.0000e+00], [4.0791e-13, 1.0000e+00, 2.4172e-12]], device='cuda:0'), tensor([[1.4972e-05, 4.6098e-01, 4.5371e-01], [4.3936e-01, 1.0625e-12, 0.0000e+00], [8.7864e-01, 1.0625e-12, 9.0441e-01], [2.2771e-14, 9.2195e-01, 1.0292e-12]], device='cuda:0')]

Versions

Linux-5.4.0-72-generic-x86_64-with-debian-bullseye-sid Python 3.6.13 |Anaconda, Inc.| (default, Feb 23 2021, 21:15:04) NumPy 1.19.2 SciPy 1.3.1 TensorLy 0.5.1 CUDA 11.0 PyTorch 1.8.1

merajhashemi commented 3 years ago

In Numpy float represents double numbers. When using Pytorch a floating-point overflow happens somewhere in the calculations. To overcome this just replace t = tl.tensor(t, device='cuda:0') with t = tl.tensor(t, dtype=torch.float64, device='cuda:0').

Although this is a bug and tensorly should infer the dtype correctly from the NumPy dtype. @JeanKossaifi this line should be changed to return torch.from_numpy(data.copy()).

The overflow happens after a few iterations when computing the factor norms. @MarieRoald @JeanKossaifi Why can't we move this line outside the main for loop just before returning the results?

cohenjer commented 3 years ago

I think the problem does not come from pytorch or numpy, but it is related to the normalization @merajhashemi mentionned. I think the non_negative_parafac function is bugged right now, something we already discussed a little with #222 but only in the HALS version. The normalization is not done correctly: we normalize at each iterations, but the weights are never accounted for in the updates, so we only keep stacking power in the normalization until divergence.

I tried to reproduce your error @rusillini using the numpy backend, still ends up with overblowing

import numpy as np
import tensorly as tl
from tensorly.decomposition import non_negative_parafac, non_negative_parafac_hals
import matplotlib.pyplot as plt

p1_g = np.array([1,0,2,0], dtype=float)
p1_c = np.array([1,1,0,0], dtype=float)
p2_g = np.array([0,1,2,0], dtype=float)
p2_c = np.array([0,1,1,0], dtype=float)
p3_g = np.array([1,0,0,2], dtype=float)
p3_c = np.array([0,0,1,1], dtype=float)

t = np.outer(p1_c,p1_g)
t = np.append(t,np.outer(p2_c,p2_g),axis=0)
t = np.append(t,np.outer(p3_c,p3_g),axis=0)
t = t.reshape(3,4,4)

t = tl.tensor(t)
factors, err = non_negative_parafac(t, rank=3, normalize_factors=True, return_errors=True)

factors2, err2 = non_negative_parafac(t, rank=3, return_errors=True)

out, err3 = non_negative_parafac_hals(t, rank=3, return_errors=True)
out.normalize

plt.semilogy(err)
plt.semilogy(err2)
plt.semilogy(err3)
plt.legend(['with normalization','without normalization','HALS'])

At least on my computer, the first error explodes.

Note that in #222 we suggested that the user may want to normalize its output tensor after computing the nonnegative parafac, and therefore we removed the normalization option from the hals version. We could go the same way here. Or properly modify the code to have mathematically correct MTTKRP and pseudo-inverses that account for weights.

JeanKossaifi commented 3 years ago

I agree @cohenjer - I think we may as well update the MTTKRP and pseudo-inverse to make it correct.

cohenjer commented 3 years ago

Ok we can add it to our todo list with @caglayantuna, but we need to make sure it does not make the whole ALS slower by introducing extra computations.

JeanKossaifi commented 3 years ago

Actually, I don't think it's incorrect @cohenjer. The weights should just get absorbed in the last factor which can be written as diag(weights)*factor. We do need an investigation of the normalization schemes and their effect on stability though.

cohenjer commented 3 years ago

I guess I am not even sure what the option should really mean. Does normalize_factors=True means the user wants normalized outputs? In which case normalizing at the last iteration would be enough. Or does it mean the factors are normalized throughout the iterations, in which case your solution is not exactly doing that.

Right now we are working on the version where normalize_factors=True means that the factors are normalized after each update, and the weights are accounted for in the update rules.

By the way the current code for nonnegatve_parafac is kinda incorrect wrt normalization, and the parafac code is also a little weird since it applies normalization after each update but then immediatly discards the weights.

JeanKossaifi commented 3 years ago

I agree @cohenjer, currently normalize is kindof a dual thing, as you mentioned, that covers both normalization during optimization and post decomposition. We need to simplify and unify the API and code. In Kolda's seminal paper(s) (and in the tensor-toolbox), I believe they do something similar to what we already have.

To me, the main questions are: i) if we want normalized factors in the end, is it helpful in any way to normalize during optimization? ii) regardless, should we normalize during the alternating optimization for numerical stability as advocated by Kolda et al?

It would be helpful to benchmark both approaches and see how it affects convergence, performance and numerical stability (e.g. are the factors in a better range, etc)?

It's probably worth documenting our process and findings too as others may find it interesting! :)

cohenjer commented 2 years ago

I am renaming the issue to match the current content.

tensorly / tensorly