xinychen / transdim

Machine learning for transportation data imputation and prediction.
https://transdim.github.io
MIT License
1.22k stars 303 forks source link

LinAlgError: SVD did not converge using LRTC-TNN #23

Open lk1983823 opened 1 year ago

lk1983823 commented 1 year ago

I have non-random missing values of about 50% orginal values with 5 feature. I try to use LRTC-TNN to restore the missing values, however, it shows LinAlgError: SVD did not converge. What can I do ? Or is there any other method can be used to impute my data? Thanks. The original data is shown below (just ignor the last figure, bottom right one with nothing showing):

image

xinychen commented 1 year ago

If I understand correctly, you have five time series. But the time series data do not involve the day dimension and thus there is not a tensor. So can you try some most basic matrix factorization models? Just like BTMF available at this repository.

lk1983823 commented 1 year ago

My data involve day dimensions, it is 1 second interval. The data ranges from 2023-02-08 05:02:02 to 2023-02-08 07:00:00. In addition, in order to use LRTC-TNN, I reshape the data to (num_feature, num_sample, time_interval)). And the time_interval is set to 60.

If I understand correctly, you have five time series. But the time series data do not involve the day dimension and thus there is not a tensor. So can you try some most basic matrix factorization models? Just like BTMF available at this repository.

xinychen commented 1 year ago

I am not sure what happened in your experiment. Would you mind trying another model and checking out the imputation performance first?

lk1983823 commented 1 year ago

Unfortunately, the BTMF doesn't perform well. Here are my toy code:

sparse_tensor = x_values_wnan.reshape(-1, time_interval, n_feature)
sparse_tensor = np.moveaxis(sparse_tensor, -1, 0)
dim = sparse_tensor.shape
sparse_mat = sparse_tensor.reshape([dim[0], dim[1] * dim[2]])

dim1, dim2 = sparse_mat.shape
rank = 10
time_lags = np.array([1, 2, 60])
init = {"W": 0.01 * np.random.randn(dim1, rank), "X": 0.01 * np.random.randn(dim2, rank)}
burn_iter = 1000
gibbs_iter = 200
mat_hat, W, X, A = BTMF(_, sparse_mat, init, rank, time_lags, burn_iter, gibbs_iter)

The feature above didn't include time_feature, like timestamps. The performance of one imputed feature is as follows: image

xinychen commented 1 year ago

Have you tried the model with more dense time_lags, e.g., time_lags = np.arange(1, 60)? I know your data is rather high-resolutional in the time dimension.

xinychen commented 1 year ago

Another comment is that if you only have 5 time series, then please make sure that the rank be not greater than 5.

lk1983823 commented 1 year ago

Another comment is that if you only have 5 time series, then please make sure that the rank be not greater than 5.

Thank you. I tried your suggestion. But it doesn't seem to work. How do you think compressed sensing? image

xinychen commented 1 year ago

Perhaps, I would like to recommend Hankel tensor completion methods for your case. Would you mind taking a look? I don't have any codes about that, but it should be not hard to implement.