Open NimaSarajpoor opened 4 years ago
Not sure if I understand the question completely, but there is a sample_weight
argument in the fit
function of TimeSeriesKmean
. Can't you precompute all weights with my_function
before calling that and pass it to fit
?
@GillesVandewiele I think we do have such a parameter for KernelKMeans
but not for TimeSeriesKMeans
. And I agree that this would be the correct way to implement it.
This should not be too difficult to implement since dtw_barycenter_averaging
already accepts weights as input, so I guess it would be a matter of:
sample_weights
argument to fit
KernelKMeans
how this argument is pre-processed and do the sameHence I tag this one as a good first issue: anyone willing to work on this should feel free to open a PR.
@GillesVandewiele @rtavenar
Thanks for the response. I took a look at the argument sample_weights for KernelKmeans fit method. According to my understanding, it seems it only accepts a pre-defined vector for weight.
However, in my case, the weights are changing. In other words, the weights of points in a cluster (to calculate its DBA) are calculated as a function of those points in that cluster and return an array with a length equal to its cluster size.
So, it would be nice if it can accept a function as well.
@Ninimama
I understand your point, yet:
TimeSeriesKMeans
would be guaranteed to convergescikit-learn
API in this case, and in scikit-learn
, sample_weights
is assumed to be a vector of fixed weights.@rtavenar
I thought about the convergence problem. However, I think that is what a user should be worried about. So, if someone wants to employ weight function, they should either mathematically or by experiment show that the results are good and the problem can be converged. So, wouldn't it be a good idea to have such an ability that one can play with weights? The tslearn package can give a warning to the user that the problem might not get converged or if the number of iteration exceeds. Any opinion?
Yes. I agree that using fixed sample_weights is a stable approach without being worried about the non-convergence error and make sure the result is reliable.
In the end, you are the expert here. So, you definitely know better than me. My field is in electrical engineering (power system) and I am a newbie in this area.
Thanks again for your responses.
@Ninimama
I understand your point, yet:
1. depending on the form of your weight computation function, I am not sure that the algorithm at stake in `TimeSeriesKMeans` would be guaranteed to converge 2. We will definitely stick to `scikit-learn` API in this case, and in `scikit-learn`, `sample_weights` is assumed to be a vector of fixed weights.
I agree! Although it should be noted that there are some exceptions to this, e.g. the KNN can accept a string for the weights
parameter (uniform or based on the distances). It can be a callable as well. While a sample_weight
is indeed a vector of weights passed during the fit
method.
But for knn, weights are just used at predict time, they are not involved in any fit time optimization.
Once again I feel that this could definitely break convergence which is not a desirable behavior.
Hi,
I opened an issue before but closed it later and decided to say it here in "Feature Request."
I was wondering if you could modify the TimeSeriesKmean function such that it can accept weight (as a callable function) in its metric_params for calculating the dtw_barycenteraveraging.
So: metric_params = {'weights: ', my_function(data_points)}
So, it is a function that gets a set of data points (observations) and based on that calculates a weight vector and returns it. It gives the flexibility to the user to define a weight function and apply it throughout the clustering process.
(In my problem, for instance, I modified the centroid of the FINAL RESULT and see that it works better for me. However, if such modification can be applied throughout the whole clustering process (and just the final result), it might better enhance the final clusters and result.)
Best, Nima