Update/Explanation for Steph

George's first model is essentially a sequence of bits, and when you "remember" the bits, there is a fixed probability that a bit would be flipped (noise). That probability can be lowered if you pay more (cost). The reward is how close your original sequence is to your remembered sequence. One thing we noted with that model is that we weren't doing any inference about a belief and not quite following the model MH wrote for us on that first meeting.

MH's starter model is about adding noise not to an observation, but to a belief. In his example, which you can see in starter.wppl, that belief is the variable p, or the weight of the coin. The ideal model was essentially observing two data points, and the non-ideal model was also observing two data points. The main difference for the non-ideal model was that it was observing one data point, and then its posterior after that observation was noisified, and that noisy posterior was used as the new prior for observing the second data point.

I tried to integrate MH's model with George's but it didn't quite work the way these things are classically done. Specifically, you want to have a set of data and observe different sizes of it (e.g. observe only the first N data points). Think about the learning curves we saw in class, where your beliefs would change as you saw more data. The learning curves they generated were more straightforward. For us, we want to inject noise into our posterior after seeing the (n - 1)th data point, and use that as our prior when we see the (n)th data point. This can be done manually (e.g. writing model 1a, remembered 1a, then 1b, etc), as was with MH's starter model. Once we have that down, we should try to abstract it using map or a recursive call (e.g. for loops that aren't for loops) so that we can see what happens over more than just say, two data points (because nobody wants to write model Xa, remembered Xa, then Xb over and over as one long chunk of code). In other words, it will allow us to see the learning curve for say, a sequence of 20 observations, 30 observations, etc.

MH said that to start off, we should focus on doing this non-hierarchical model of just inferring coin weight based on a sequence of observations. For scoring/rewarding as to whether we remembered things "right", we should try to predict the next observation. This would be a binary right or wrong for now (since the next actual observation is either equal to our prediction or not). When we have our stuff together, we can think about a hierarchical model of coin flips. MH suggested in that case, we would have two high-level groups of coins: heavy-weight coins and low-weight coins.

Additionally also note that we were generating our sequence of observations using a normal flip. We should try to generate it using a biased flip e.g. flip(0.9) so that our model actually has something to learn.

Note: some of the files in the week 2 may be confusing. learning_curve.wppl is MH's modification of my stuff to show how learning curves should be generated (e.g. without noise). 11_4_notes.wppl has some hastily written notes during the meeting which I have reiterated and expanded here. Let me know if you have any questions; George can probably also clarify some points.

lucy3 / RRmemory

Update/Explanation for Steph #4