tslearn-team / tslearn

The machine learning toolkit for time series analysis in Python
https://tslearn.readthedocs.io
BSD 2-Clause "Simplified" License
2.91k stars 342 forks source link

SBD distance function #276

Open NimaSarajpoor opened 4 years ago

NimaSarajpoor commented 4 years ago

Hi,

I was wondering if the SBD distance function (used in KShape) can be easily accessed?

For instance, in spite of the existence of cdist_dtw or k-mean clustering (with dtw metric) in tslearn package, the DTW distance itself function is available.

I was wondering if the same thing can be applied for the KShape method and its distance (SBD) can be easily used? I understand that DTW is more popular, but having access to new tools such as SBD function can help researchers on the application side to better investigate new methods for their own problems.

Best, Nima

xnwder commented 4 years ago

It seems that you can add these two functions in KShape source code: `

def _get_norms(self,X):
    norms = []
    for x in X:
        result = numpy.linalg.norm(x)
        norms.append(result)
    norms = numpy.array(norms)
    return norms

def _sbd_cross_dist(self,X):
    norms = self._get_norms(X)
    return 1. - cdist_normalized_cc(X, X, norms1=norms, norms2=norms, self_similarity=False)`

so you can get SBD cross distances through this: `

#KShape class
ks = KShape(n_clusters= num_cluster ,n_init= 5 ,verbose= True ,random_state=seed)
X = TimeSeriesScalerMeanVariance(mu= 0.0 ,std= 1.0 ).fit_transform(X)
y_pred = ks.fit_predict(X)
dists = ks._sbd_cross_dist(X)
np.fill_diagonal(dists,0)
score = silhouette_score(dists,y_pred,metric="precomputed")
print("silhouette_score: " + str(score))`

That's what i did, and it works. For more details you can see https://blog.csdn.net/qq_37960007/article/details/107937212

swifmaneum commented 2 years ago

I definitely support this feature request!

In the meantime, you can use the SBD function from this repo:

def _sbd(x, y):
    """
    >>> _sbd([1,1,1], [1,1,1])
    (-2.2204460492503131e-16, array([1, 1, 1]))
    >>> _sbd([0,1,2], [1,2,3])
    (0.043817112532485103, array([1, 2, 3]))
    >>> _sbd([1,2,3], [0,1,2])
    (0.043817112532485103, array([0, 1, 2]))
    """
    ncc = _ncc_c(x, y)
    idx = ncc.argmax()
    dist = 1 - ncc[idx]
    yshift = roll_zeropad(y, (idx + 1) - max(len(x), len(y)))

    return dist, yshift

using the following helper functions:

def roll_zeropad(a, shift, axis=None):
    a = np.asanyarray(a)
    if shift == 0:
        return a
    if axis is None:
        n = a.size
        reshape = True
    else:
        n = a.shape[axis]
        reshape = False
    if np.abs(shift) > n:
        res = np.zeros_like(a)
    elif shift < 0:
        shift += n
        zeros = np.zeros_like(a.take(np.arange(n-shift), axis))
        res = np.concatenate((a.take(np.arange(n-shift, n), axis), zeros), axis)
    else:
        zeros = np.zeros_like(a.take(np.arange(n-shift, n), axis))
        res = np.concatenate((zeros, a.take(np.arange(n-shift), axis)), axis)
    if reshape:
        return res.reshape(a.shape)
    else:
        return res

def _ncc_c(x, y):
    """
    >>> _ncc_c([1,2,3,4], [1,2,3,4])
    array([ 0.13333333,  0.36666667,  0.66666667,  1.        ,  0.66666667,
            0.36666667,  0.13333333])
    >>> _ncc_c([1,1,1], [1,1,1])
    array([ 0.33333333,  0.66666667,  1.        ,  0.66666667,  0.33333333])
    >>> _ncc_c([1,2,3], [-1,-1,-1])
    array([-0.15430335, -0.46291005, -0.9258201 , -0.77151675, -0.46291005])
    """
    den = np.array(norm(x) * norm(y))
    den[den == 0] = np.Inf

    x_len = len(x)
    fft_size = 1 << (2*x_len-1).bit_length()
    cc = ifft(fft(x, fft_size) * np.conj(fft(y, fft_size)))
    cc = np.concatenate((cc[-(x_len-1):], cc[:x_len]))
    return np.real(cc) / den
GillesVandewiele commented 2 years ago

Hi @swifmaneum ,

This seems like a great addition to tslearn. Do you reckon this could be built into a PR?

Best regards, Gilles

swifmaneum commented 2 years ago

Hi @GillesVandewiele, sure, I added a WIP merge request. Tests and documentation are still lacking...