interpreting-rl-behavior / interpreting-rl-behavior.github.io

Code for the site https://interpreting-rl-behavior.github.io/
Creative Commons Attribution 4.0 International
0 stars 0 forks source link

Gradient of hx at timestep 0 of sample 00000 highly correlated #29

Closed leesharkey closed 2 years ago

leesharkey commented 2 years ago

I'm not sure the problem is fixed after commit c23f4e7 or commit fde6f34

Still seeing the following:

When the panel loads initially, changing the saliency type on the panel doesn't change the colours But if you change either the sample OR the step, then changing saliency type on the panel does change the colours. There is an additional problem that might be unrelated:

The colour of hx_direction_1 and hx_direction_2 are the same but shouldn't be.

NixGD commented 2 years ago

I'm not convinced that either of these are bugs, at least not with the panel. I think this might just be a feature of the data in sample_00000 at timestep 0, and all of the samples & timesteps for hx_direction_1 and hx_direction_2. I'm not sure though, I need to look at the data in more detail which I will do later.

NixGD commented 2 years ago

The panel is working properly, but the underlying data is strange. You can graph this with:

sample_path = "train-procgen-pytorch/generative/recorded_informinit_gen_samples/sample_00000"
hx = np.load(sample_path + '/agent_hxs.npy')
grad_hx_action = np.load(sample_path + '/grad_hx_action.npy') 
grad_hx_value = np.load(sample_path + '/grad_hx_value.npy')
plt.plot(grad_hx_action[0,:], grad_hx_value[0, :], ".")

produces a bunch of points in a straight line through the origin, with shallow slope -- so the gradient vectors are in the same directions, but different magnitude. The correlation isn't perfect, but r≈.95 or something.

This produces the bug discovered above, where the coloring of the bar chart does not appear to change (it does, but imperceptibly).

leesharkey commented 2 years ago

This is actually expected behaviour (where, in order to expect it, I needed to do a little reflection). Early in a sequence, the slightest change in initial condition will lead to large changes many timesteps into the future. Gradients of e.g. value in the 11th timestep are therefore highly dependent on the initial condition. But so too are action gradients! What's more, some deltas in initial condition don't matter much compared to deltas in other directions. Therefore gradients early on will be correlated.

If this is true, we'd expect less correlation in later timesteps, closer to where the gradient has been taken (e.g. t=11)

We find this is indeed the case. Here is a modified version of your plot, where we instead use the e.g. 11th timestep instead of the 0th: nixplot

import matplotlib.pyplot as plt sample_path = "../train-procgen-pytorch/generative/recorded_informinit_gen_samples/sample_00000" hx = np.load(sample_path + '/agent_hxs.npy') grad_hx_action = np.load(sample_path + '/grad_hx_action.npy') grad_hx_value = np.load(sample_path + '/grad_hx_value.npy') plt.scatter(x=grad_hx_action[11,:], y=grad_hx_value[11, :], s=0.9) plt.colorbar() plt.savefig("../nixplot.png") plt.close()

The correlation disappears. I've tried other timesteps, and there is a smooth transition from correlation at t=0 to no correlation at t=12

I'll close the issue. Feel free to re-open if you find reason to.