Noah / MH Meeting 11/16

Hey team,

Noah and I chatted this morning and we laid out a 3 stage analysis that is very consistent with what we’ve been talking about, but perhaps clearer in my mind. Here it is:

(1) There is a noise/corrupt function between t0 and t1, it takes in a distribution (i.e., the agent's posterior after observing stuff at t0) and a parameter f. When f is 0, the function returns the distribution; when f is Infinity, it returns some random distribution (perhaps for the beta-binomiel case: the prior, a uniform). You can interpolate between the prior and the true posterior using the setup I showed you on the board yesterday:

Infer({method: “enumerate”}, function(){
    return flip( g(f) ) ? prior: posterior
})

when g is a function that maps f to range [0, 1]

The question is: What value of f should you tolerate? Concretely: if f is the cost (or inverse-cost I think) to retain the memory perfectly, and you have some reward for getting the right answer, what should you do?

You will need to make a decision about the data given to the agent at t0, and the prediction task (and the reward for getting the right answer).

You will also need to make a decision about the noise process as well. Note that the noisify function we talked about previously, and the idea from yesterday of only keeping the mean and/or variance are not totally independent. You can have a noise process that retains the mean but corrupts the variance (e.g., adding gaussian noise). Start simple, make some decisions and try to justify them in the paper.

We could look at this with: a simple Gaussian observation model as well.

If the agent knows about the corruption process and the value of f, what prior should she actually use at time t1? For example, if I know that the posterior being passed to me by my past self is only reliable for the mean, I shouldn't trust the variance. In this case, what variance should I adopt?
(A little half baked) If an agent knows at time t0, that at time t1, she's going to use some modified version of the posterior that is passed to her as her prior, how does the analysis in (1) change? For example, if I know that my future self will only trust the mean of the distribution that I pass him, and will do something rationally w.r.t. the variance, how does my allocating of resources for preserving the memory change?

Thanks for writing this out!

We were playing around with a noise function that was pretty much flip(noise)? uniformDraw([0.1,0.3, 0.5, 0.7,0.9]) : p and it showed a slight increase in rewards as the payment increased. We also tried a version where flip(noise) == true would result in observing the opposite of the next observation, e.g. an implanted/very wrong observation. That showed a bigger difference in rewards as the payment increased. We haven't implemented the retaining mean ^ variance, mean & variance, and full posterior version of the noise process yet but that is next on to do list.

I'm confused as to why George's set up of noise didn't result in any trends and why so many noise functions we have played around with aren't effective enough to show differences for different payment amounts. It seems that the learners are pretty robust at making good predictions even off of corrupted posteriors. That is, even if the posterior is a flatter bell curve than an un-noisified posterior or has some extraneous extra spikes in it, the model still does really well (when predicting the fraction of heads in the next 300 flips).

Also there just seems to be a lot of parameters that result in different effects. For example, the true weight of the coin is less worth learning when it's 0.5 than when it's 0.1, so sometimes paying more actually makes you do worse. Also, how many values you uniform draw from for your prior also seems to impact performance in funky ways.

lucy3 / RRmemory

Noah / MH Meeting 11/16 #7