Understand and Experiment with Cosine Similarity Finite Differences in LUNA

msbutler commented 3 years ago

@jscuds @vsavram I didn't make this clear in #1, but there are 2 places in the LUNA algo where it will be useful to experiment with numerical methods: 1) The overall optimization algo, the focus of #1; 2) measuring cosine similarity between auxiliary functions within the actual LUNA algorithm. I'm opening this issue to focus on cosine similarity.

@mecunha Correct me if I'm wrong, but the 207 group is aiming to push a first draft of the LUNA algo code this weekend. Once that's ready, @jscuds @vsavram will be able to play around with the cosine similarity finite differences. Then on Monday, we can answer any questions you have about the LUNA algo and the code. Happy to answer any conceptual questions before Monday as well.

msbutler commented 3 years ago

@jscuds @vsavram (@mecunha ) I just updated the codebase to include the LUNA model. To get a high level grasp of the code, I'd the following: 1) Go over the NLMDemo and broadly understand how NLM.train() works. No need to understand the details of the bayes helper functions. You already understand how the ff.fit function works! 2) Go over the LUNA demo in LUNA2.ipynb. Note that LUNA is essentially an NLM with a fancy objective function. Again, no need to understand the details of all these functions called in the make_objective function (I don't...). Notice that you can plug in any finite difference numerical method every time you instantiate a LUNA object.

@mecunha and my next steps

Speed up LUNA. it's really slow, which is why we only run it for 100 iterations

Your potential next steps:

create a frame work where you can run LUNA with different numerical differentiation methods and time their implementation, and plot their graphs. The output will be bogus until we speed up LUNA. so don't worry about how the graphs look.

let me know if you have questions!

jscuds commented 3 years ago

@mecunha, @msbutler (@vsavram let me know if I missed anything or worded something incorrectly):

We reviewed the code and I had a question about "aux functions":

[x] Can you provide an explanation for the "aux functions"? I see where the method is defined get_aux_funcs(self, W); and I see how it creates the functions called in similarity_score…but I still didn’t understand where they come from intuitively. My lack of understanding stems from little neural net experience.

mecunha commented 3 years ago

Good question! Here's my attempt at explaining the intuition:

To recap, with a Neural Linear Model (NLM), we train a neural net to produce one output per data point, and we want that output to fit the training data well. If it fits the training data well, then when we chop off the last layer of weights and replace it with Bayesian linear regression, our predictions will match the training data well.

With LUNA, our criteria for training the neural net are a little different. We still want the neural net output to fit the data well (before we chop off the last layer and replace it with Bayesian linear regression), but we also want diversity in the weights of the hidden layers (diverse weights produce diverse predictions in data poor regions later on with the Bayesian linear regression). So instead of training the neural net to produce one output for each data point, we train it to produce multiple outputs for each data point, all of which must fit the training data well, but also must use distinctly different ways (or weights) of producing their good prediction. By requiring that the last layer of weights be as different from each other as possible while still producing good predictions, we push the previous layers of weights to diverge from each other a bit as well (because the ultimate result of all the previous layers feeds into that last layer, so if the previous layers produce a variety of inputs, it makes it a little easier for the last layer to produce a variety of ways of getting to a good prediction). So each auxiliary function represents a different way of predicting the correct output for the training data, therefore satisfying both the mean squared error and the cosine similarity components of our objective function. Of course, after we've fit the neural net, we chop off the last layer of weights (which constitutes the auxiliary functions) and replace it with Bayesian linear regression.

Let me know if that helped or if it created more confusion haha.

msbutler commented 3 years ago

@jscuds @vsavram On a seperate note, we found a bug in Luna.default_grad_finit_diff()... if you feed one X input into and have a function from R^D -> R, it currently returns a scalar, even though it should return a value in R^D. Hoping to fix in the next 24 hours.

msbutler commented 3 years ago

@jscuds @vsavram Just pushed an updated LUNAv3 that is significantly faster.

vsavram / AM205-Project

Understand and Experiment with Cosine Similarity Finite Differences in LUNA #2