Closed agrija9 closed 4 years ago
Hi @agrija9
Does your implementation support also multivariate data-sets? Yes. It does
Let's break your problem statement down-
number_of_features
. In the case of multi-variate timeseries, pass the number of "variates" here. In case of univariate, please pass 1.Refer to this commit - https://github.com/tejaslodaya/timeseries-clustering-vae/commit/e7b57a6748ef18efbd9f026907e85c31817e2b42
Here, I create 1st dimension of LSTM as num_features (in Encoder).
Let me know if this makes sense.
Hi @tejaslodaya ,
Thanks for your reply,
One more remark about the unlabelled data. As you say, labelled data can help us determine how the VRAE is learning these dense vectors by running a k-means on top othem and compare our true labels with what k-means is clustering.
If for now I just want to get the dense vectors of my data using the VRAE, say compress them to 20 dimensions and then project them to 3 or 2 dimensions (either PCA or t-SNE). As far as I understand, I don't need labels to do this, right?
Best
Hi @agrija9 ,
You're correct. You don't need labels if you have a way of visualizing the clusters. If you look closely in the plots on this project's README.md, you will find a clear way of making a distinction between two clusters.
That again totally depends on the data and hyperparameters you've used to train the model with.
Let me know if you want to know anything else.
Hi @tejaslodaya ,
Thanks for your feedback. Another couple of doubts:
In your case, you show the compression of your time-series from 140 to 20 elements. In my case, the length of my time-series is 9601. Do you have any idea on what is an appropriate compression ratio for my case? Or do you know if there is any paper that analyzes such compression ratios?
Autoencoders and more generally neural networks are used to reduce data dimensionality in many domains. However, there are other more straight-forward methods like kernel-PCA which can perform non-linear dimensionality reduction through the use of kernels. I'm thinking about testing this on my data and check performance. What's your thought on this?
All the best!
Hi @agrija9 ,
To answer your 1st question, have a look at another issue to which I commented about possible steps if you have a larger timeseries. Link: https://github.com/tejaslodaya/timeseries-clustering-vae/issues/2#issuecomment-548517460
If you still want to go ahead with the raw clustering and feeding in 9k dimensions, I would prefer a much stronger neural network with "gradually" descending layer in encoders and their relative pairs in the decoders. Note: This network will have a lot of parameters to train and would need larger machine and more time to train this.
For example, you can go with 9k - 2048 - 512 - 128 - 32 (encoder) and 32 - 128 - 512 - 2048 (decoder). This way you'll have 32 dimensions in the end. I haven't tried such a large network. Please let me know how your embeddings shape up if you give this a try.
I don't know much about kernel-PCA. I had tried DTW for my clustering usecase, but it performed very poor results compared to VRAE.
Thanks for the issue. Included it in FAQs section to increase visibility
Hi @tejaslodaya,
I will give a try to the ideas you mentioned above. Thanks a lot for your feedback. If anything I'll reach out to you with more questions (:
Sure. Not a problem
Hi tejaslodaya,
I want to run the VRAE for a single class, unlabelled and multivariate time series data-set.
Does your implementation support also multivariate data-sets?
I saw in one of your commits in utils.py a comment "add support for multivariate" but I'm not being able to see this reflected in the code.
Any insight is highly appreciated!