Questions about LearningShapelets implementation

tcrasset commented 4 years ago

Hello !

Using : tslearn==0.3.1, keras==2.2.4 and tensorflow==1.10.

I'm doing my Master's thesis on time series classification and I successfully used your implementation of the Learning Shapelets Classifier to classify multi-dimensional time series.

In my report, I'm trying to explain the different layers of the architecture, however the documentation is not very helpful and the original paper by Grabocka et al. does not go into detail.

Looking through the code and using the keras.utils.plot_model() method (see graph below), I was able to gather the following information:

Each dimension gets one input, one Conv1D layer and one LocalSquaredDistanceLayer.
The LocalSquaredDistanceLayer layer is initially responsible for extracting the 'average' shapelet using KMeansShapeletInitializer from the input time series, as well as computing the distance between the shapelet and the input time series during the training iterations.
The Conv1D layer is not trainable and extracts subsequences from the input time series and feeds them to LocalSquaredDistanceLayer for the distance computation.

Are my conclusions correct so far ?

The questions I am having are the following :

What part of the architecture represents the shapelets? Where are the shapelets 'stored' in each iteration?
Grabocka et al. use a differentiable soft-minimum function to take the minimum between the subsequences of the input time series and the shapelet. You mention here that you do not, and instead use a 'hard' minimum with GlobalMinPooling1D. That is all fine by me, however I do not understand why you add the outputs of every LocalSquaredDistanceLayer using an Add layer? Why isn't there a GlobalMinPooling1D after every LocalSquaredDistanceLayer?

I apologize in advance if my questions do not make sense, I don't quite grasp all the details in deep learning. Thank you very much for your time and your library,

Cheers, Tom

I have 4 shapelets of length 5 each, and a time series of length 59. (I have 19 dimensions but I edited the graph to make it clearer)

rtavenar commented 4 years ago

Hi @tcrasset

There are several (good) subquestions in your question, I think.

Regarding the model itself, it is a simple model that computes local distances between subseries and shapelets and then aggregate these local distances to retain, for each shapelet, the minimum distance. This representation then feeds a fully connected layer.

Concerning our implementation, it is far from optimal, and I dealt with different channels through different (parallel) layers just because I was lazy at the time. A better way to do would be to deal with all channels at once, and it should not be too difficult to implement. Another thing is that we have a fake convolutional layer (with fixed weights) that just extracts subseries from the input so that the subsequent layer can compute distances between these subseries and the shapelets.

Anyway, if you want to see where shapelet coefficients are, you should look at those lines:

https://github.com/tslearn-team/tslearn/blob/75cd661faaeef62d899d26a26d027defc1ffae04/tslearn/shapelets.py#L365-L380

Finally, the reason why we aggregate through Add layers is that the squared distance between multidimensional subseries is the sum of distances along each channel. But once again, this implementation is far from great.

Hope this helps, Romain

PS: by the way, if anyone wants to refactor the shapelet code, that would be great (would probably make the models faster) and I'd be glad to help by reviewing the code.

tcrasset commented 4 years ago

(Sorry for closing, missclicked)

Thank you very much for your fast reply!

This line made everything click for me:

Finally, the reason why we aggregate through Add layers is that the squared distance between multidimensional subseries is the sum of distances along each channel. But once again, this implementation is far from great.

Anyway, if you want to see where shapelet coefficients are, you should look at those lines:

What you are saying is that the weights of the shapelets_%d_%d layer represent the shapelets?

With regard to refactoring the shapelet code, I am not up for the challenge. However, I implemented SCRIMP++ (matrix profile), time series snippets and MPdist in Java, so if that is something you are interested in, I'll be glad to help port my implementation to this library.

Have a good day, Tom

rtavenar commented 4 years ago

What you are saying is that the weights of the shapelets_%d_%d layer represent the shapelets?

Yep, and first index is the shapelet id, second index is the channel id.

With regard to refactoring the shapelet code, I am not up for the challenge. However, I implemented SCRIMP++ (matrix profile), time series snippets and MPdist in Java, so if that is something you are interested in, I'll be glad to help port my implementation to this library.

That would be great! Could you maybe open a new Issue to detail what you could offer, and what these algorithms bring to the matrix profile ecosystem (sorry, I'm not an expert of MPs)?

rtavenar commented 4 years ago

With regard to refactoring the shapelet code, I am not up for the challenge. However, I implemented SCRIMP++ (matrix profile), time series snippets and MPdist in Java, so if that is something you are interested in, I'll be glad to help port my implementation to this library.

That would be great! Could you maybe open a new Issue to detail what you could offer, and what these algorithms bring to the matrix profile ecosystem (sorry, I'm not an expert of MPs)?

@tcrasset Feel free to join the discussion in #260 if you will

GillesVandewiele commented 4 years ago

Closing this issue as questions seem to be answered. Please re-open if you have any more questions!

rtavenar commented 4 years ago

Let me reopen this one as a reminder that LS implementation should be made simpler by treating all modalities at once instead of using different parallel blocs.

tslearn-team / tslearn

Questions about LearningShapelets implementation #258