Open NimaSarajpoor opened 4 years ago
It seems that you can add these two functions in KShape source code: `
def _get_norms(self,X):
norms = []
for x in X:
result = numpy.linalg.norm(x)
norms.append(result)
norms = numpy.array(norms)
return norms
def _sbd_cross_dist(self,X):
norms = self._get_norms(X)
return 1. - cdist_normalized_cc(X, X, norms1=norms, norms2=norms, self_similarity=False)`
so you can get SBD cross distances through this: `
#KShape class
ks = KShape(n_clusters= num_cluster ,n_init= 5 ,verbose= True ,random_state=seed)
X = TimeSeriesScalerMeanVariance(mu= 0.0 ,std= 1.0 ).fit_transform(X)
y_pred = ks.fit_predict(X)
dists = ks._sbd_cross_dist(X)
np.fill_diagonal(dists,0)
score = silhouette_score(dists,y_pred,metric="precomputed")
print("silhouette_score: " + str(score))`
That's what i did, and it works. For more details you can see https://blog.csdn.net/qq_37960007/article/details/107937212
I definitely support this feature request!
In the meantime, you can use the SBD function from this repo:
def _sbd(x, y):
"""
>>> _sbd([1,1,1], [1,1,1])
(-2.2204460492503131e-16, array([1, 1, 1]))
>>> _sbd([0,1,2], [1,2,3])
(0.043817112532485103, array([1, 2, 3]))
>>> _sbd([1,2,3], [0,1,2])
(0.043817112532485103, array([0, 1, 2]))
"""
ncc = _ncc_c(x, y)
idx = ncc.argmax()
dist = 1 - ncc[idx]
yshift = roll_zeropad(y, (idx + 1) - max(len(x), len(y)))
return dist, yshift
using the following helper functions:
def roll_zeropad(a, shift, axis=None):
a = np.asanyarray(a)
if shift == 0:
return a
if axis is None:
n = a.size
reshape = True
else:
n = a.shape[axis]
reshape = False
if np.abs(shift) > n:
res = np.zeros_like(a)
elif shift < 0:
shift += n
zeros = np.zeros_like(a.take(np.arange(n-shift), axis))
res = np.concatenate((a.take(np.arange(n-shift, n), axis), zeros), axis)
else:
zeros = np.zeros_like(a.take(np.arange(n-shift, n), axis))
res = np.concatenate((zeros, a.take(np.arange(n-shift), axis)), axis)
if reshape:
return res.reshape(a.shape)
else:
return res
def _ncc_c(x, y):
"""
>>> _ncc_c([1,2,3,4], [1,2,3,4])
array([ 0.13333333, 0.36666667, 0.66666667, 1. , 0.66666667,
0.36666667, 0.13333333])
>>> _ncc_c([1,1,1], [1,1,1])
array([ 0.33333333, 0.66666667, 1. , 0.66666667, 0.33333333])
>>> _ncc_c([1,2,3], [-1,-1,-1])
array([-0.15430335, -0.46291005, -0.9258201 , -0.77151675, -0.46291005])
"""
den = np.array(norm(x) * norm(y))
den[den == 0] = np.Inf
x_len = len(x)
fft_size = 1 << (2*x_len-1).bit_length()
cc = ifft(fft(x, fft_size) * np.conj(fft(y, fft_size)))
cc = np.concatenate((cc[-(x_len-1):], cc[:x_len]))
return np.real(cc) / den
Hi @swifmaneum ,
This seems like a great addition to tslearn. Do you reckon this could be built into a PR?
Best regards, Gilles
Hi @GillesVandewiele, sure, I added a WIP merge request. Tests and documentation are still lacking...
Hi,
I was wondering if the SBD distance function (used in KShape) can be easily accessed?
For instance, in spite of the existence of cdist_dtw or k-mean clustering (with dtw metric) in tslearn package, the DTW distance itself function is available.
I was wondering if the same thing can be applied for the KShape method and its distance (SBD) can be easily used? I understand that DTW is more popular, but having access to new tools such as SBD function can help researchers on the application side to better investigate new methods for their own problems.
Best, Nima