musco-ai / musco-pytorch

MUSCO: MUlti-Stage COmpression of neural networks
BSD 3-Clause "New" or "Revised" License
71 stars 16 forks source link

Running demo code results in "LinAlgError: SVD did not converge" or "ValueError: array must not contain infs or NaNs" #14

Open styler00dollar opened 3 years ago

styler00dollar commented 3 years ago

Like I already mentioned in Issue 13, the demo code seems to crash with an error.

from torchvision.models import resnet50
from flopco import FlopCo
from musco.pytorch import CompressorVBMF, CompressorPR, CompressorManual

model = resnet50(pretrained = True)
model.cuda()
model_stats = FlopCo(model, device = 'cuda')

compressor = CompressorVBMF(model,
                            model_stats,
                            ft_every=5, 
                            nglobal_compress_iters=2)
while not compressor.done:
    compressor.compression_step()
compressed_model = compressor.compressed_model
~/anaconda3/lib/python3.8/site-packages/numpy/linalg/linalg.py in _raise_linalgerror_svd_nonconvergence(err, flag)
    104 
    105 def _raise_linalgerror_svd_nonconvergence(err, flag):
--> 106     raise LinAlgError("SVD did not converge")
    107 
    108 def _raise_linalgerror_lstsq(err, flag):

LinAlgError: SVD did not converge

or

~/anaconda3/lib/python3.8/site-packages/numpy/lib/function_base.py in asarray_chkfinite(a, dtype, order)
    495     a = asarray(a, dtype=dtype, order=order)
    496     if a.dtype.char in typecodes['AllFloat'] and not np.isfinite(a).all():
--> 497         raise ValueError(
    498             "array must not contain infs or NaNs")
    499     return a

ValueError: array must not contain infs or NaNs

The output seems to be random and one of both, if code gets run multiple times.

engharat commented 2 years ago

I managed to fix it by replacing scikit-tensor-py3 calls with tensotly calls. The example works fine now, and I avoided also an ugly numpy&scipy downgrade, which was required by scikit-tensor-py3. For anyone interested, here is what I did: Remove from musco/pytorch/compressor/decompositions/tucker2.py any import to scikit-tensor-py3 functions Add import tensorly tensorly.set_backend("pytorch") in get_tucker_factors the weight line becomes: weights = tensorly.tensor(self.weight.cpu()) The tucker call changes so that it uses tensorly.decomposition.tucker: core, (U_cout, U_cin, U_dd) = tensorly.decomposition.tucker(weights, [self.ranks[0], self.ranks[1], weights.shape[-1]], init='nvecs') Finally few lines down, in the same function, change core = core.dot(U_dd.T) into core = core.matmul(U_dd.T) to use pytorch matrix multiplication (.dot works only for 1D vectors).