microsoft / otdd

Optimal Transport Dataset Distance
MIT License
151 stars 48 forks source link

RuntimeError: symeig_cpu: the algorithm failed to converge; 643 off-diagonal elements of an intermediate tridiagonal form did not converge to zero. #8

Closed huaxinru closed 3 years ago

huaxinru commented 3 years ago

Hi, I tried to run your gradient flow code: Here is my code:

import os
import matplotlib
%matplotlib inline 
#Comment out if not on notebook
import torch
from torchvision.models import resnet18

from otdd.pytorch.datasets import load_torchvision_data
from otdd.pytorch.distance import DatasetDistance, FeatureCost
from otdd.pytorch.flows import OTDD_Gradient_Flow
from otdd.pytorch.flows import CallbackList, ImageGridCallback, TrajectoryDump

# Load datasets
loaders_src = load_torchvision_data('MNIST', valid_size=0, resize = 28, maxsize=1000)[0]
loaders_tgt = load_torchvision_data('USPS',  valid_size=0, resize = 28, maxsize=1000)[0]

outdir =  os.path.join('out', 'flows')
callbacks = CallbackList([
  ImageGridCallback(display_freq=2, animate=False, save_path = outdir + '/grid'),
])

flow = OTDD_Gradient_Flow(loaders_src['train'], loaders_tgt['train'],
                          ### Gradient Flow Args
                          method = 'xonly-attached',                          
                          use_torchoptim=True,
                          optim='adam',
                          steps=10,
                          step_size=1,
                          callback=callbacks,              
                          clustering_method='kmeans',                                      
                          ### OTDD Args                          
                          online_stats=True,
                          diagonal_cov = False,
                          device='cuda'
                          )
d,out = flow.flow()

then I received this error:

RuntimeError: symeig_cpu: the algorithm failed to converge; 643 off-diagonal elements of an intermediate tridiagonal form did not converge to zero.

Do you know what is wrong? Thank you so much!

peterdarkdarkgogo commented 2 years ago

Please could you help me this question? I met this too.

chenmzh commented 2 years ago

I came up with the same problem, did you figure out why? Thank you very much!

ChenChengKuan commented 2 years ago

For people who have the issues, I found a workaround by just changing symsqrt_v2(func='symeig') to symsqrt_v2(func='svd') in pytorch/sqrtm.py. Based on the comment left there, I feel like the author favors this approach.

dmelis commented 2 years ago

Hi all, apologies for the delay - busy times. These errors typically happen when the data is ill-conditioned. I favored the .symeig method because of speed, but indeed .svd tends to be more stable, because it works regardless of whether the matrix is PSD or not. So I would suggest what @ChenChengKuan proposes. If even this doesn't fix it for any of you, please let me know.