Closed LemonPi closed 4 years ago
This is currently being worked on at https://github.com/LemonPi/DCC/tree/visual_example You can scroll down on the readme to see the steps I've done so far. Afterwards I'd like to merge the steps together in one script so parameters don't have to be given multiple times. My understanding of why you separate the steps is that the pretraining stuff (everything before running DCC) just has to be done once on the data regardless of how we use the pretrained data afterwards (pretraining and training are independent).
I have 2 questions: 1) the end result of running DCC is relatively low loss, low accuracy, and high number of components (232 instead of the expected 3 for 600 data points; results attached, you can just run tensorboard on it) - why does this happen?
2) The feature transform/embedding is currently done by the net
autoencoder. Is there an easy way to specify a different representation of the transform, for example if we have a prior on what kind of space the transform lives in? Or, if we didn't care about reconstructing the original data so we don't need an autoencoder (although for this we'll also have to change the objective function). Also is there an easy way to plug in a fixed transform (I want to use the identity transform to visualize DCC in progress in the original state space)?
1.zip
Updated results with plotting of U
(see image tab of tensorboard)
1.zip
@LemonPi It is a good idea to have one visual example of steps followed. Thank you for making such one happen.
Answers to your questions,
There can be multiple reasons. a) Did you normalised the data before applying mkNN ? From your example code in the link provided I note that you haven't suggested any normalisation procedure (which is part of edgeConstruction.py Code). b) What is the value of 'k' set for mkNN ? Try increasing it. c) Try decreasing the value of sigma beyond the limits of program.
mkNN graph encodes the topology of the underlying data. Basically it is considering the prior of data space. You can change the weight of the different components of an objective (currently they are set to be numerically equal). You can easily replace AE net with identity net. Just modify the network to be identity function with latent representation to be same as input. This will be equivalent to solving RCC using SGD optimiser instead of Least-Square solver.
Hey thanks for the response!
a) The testing was without normalization; my latest test with normalization errored out in copyGraph where the data0 and data1 were different. Haven't figured out why yet, do you have any ideas? This was just changing preprocess='normalization',
b) with a higher value this tends to work better, and kNN works better than mkNN in this particular case (with pretty much identity embedding) because by default kNN uses euclidean and mkNN uses cosine measure.
will try
@LemonPi Sorry my mistake. You should apply normalisation for both pretaining as well graph construction stage. For all my experiments for DCC, normalisation was performed inside make_data.py file. Try normalising to [-1,1]
I see, I think it makes sense to define a single point of normalization. Since pretraining and graph construction are done independently, the normalization should taken out of pretraining done before. What do you think of moving the normalization as a separate step after make_data?
So with normalization (inside make_data like the others), identity network, and knn graph it works as expected: 6.zip
Many epochs were run without much change after epoch 3. This was due to sigma2 needing to decrease to delta2 (scheduled decrease). Do you know why delta2 was initialized so high?
But with mknn edge graph it doesn't work at all xD due to the default cosine measure. What do you typically use for the nn graph? mknn with cosine measure? How do you know that'll work on the embedding outputs?
@LemonPi Great. The initial value of sigma2 follows the convention as in RCC work. I remember the large initial value was set based on fine-tuning on MNIST validation set. However, overall it works well across all datasets. So there was no need to fine-tune was every other datasets. There is no harm in setting to high value. Only problem I see is that it might take many epoch for convergence even on simple data.
Regarding graph: I typically use mknn with cosine measure. There is no theory behind why it should or not work for embedding data. I believe that for high-dimensional data cosine-measure makes more sense due to normalised [-1,1] output.
It would be good to have a really simple and small data set that could be easily visualized (2D clusters like below) to train end-to-end. This would be helpful to me because it would clarify where pre-training and creating the mkNN graph comes in. I'm planning to create and work with such a data set then submit a pull request; are there any gotcha's I should be aware of?