raghakot / keras-vis

Neural network visualization toolkit for keras
https://raghakot.github.io/keras-vis
MIT License
2.98k stars 664 forks source link

Grad-CAM extended for ConvNet-RNN structures (optionally) #45

Open evaldsurtans opened 7 years ago

raghakot commented 7 years ago

Can you explain the use-case for this? I can't glean much info from the commit.

evaldsurtans commented 7 years ago

For example ConNet-RNN looks like this:

model_target.add(TimeDistributed(Conv2D(32, (8, 8), strides=(4, 4), kernel_initializer=glorot_uniform(seed=init_seed), padding='same', activation='relu',
                                        input_shape=(high_dimensions_width, high_dimensions_height, high_dimensions_channels)), input_shape=(params['frames_back'], high_dimensions_width, high_dimensions_height, high_dimensions_channels)))
model_target.add(TimeDistributed(Conv2D(64, (4, 4), strides=(2, 2), kernel_initializer=glorot_uniform(seed=init_seed), activation='relu')))
model_target.add(TimeDistributed(Conv2D(64, (3, 3), kernel_initializer=glorot_uniform(seed=init_seed), padding='same', activation='relu')))
model_target.add(TimeDistributed(Reshape((-1,))))
model_target.add(LSTM(512, kernel_initializer=glorot_uniform(seed=init_seed), recurrent_initializer=orthogonal(seed=init_seed), input_shape=(params['frames_back'], low_dimensions_state), return_sequences=True, dropout=params['dropout'], recurrent_dropout=params['dropout']))
model_target.add(LSTM(512))
model_target.add(Dense(dimensions_actions,kernel_initializer=glorot_uniform(seed=init_seed), name='DenseLinear

This how it can be used:

seed_img = Image.fromarray(env.getScreenRGB())
seed_img = seed_img.convert('L').convert('RGB')
seed_img_arr = np.asarray(seed_img).astype('uint8')

action_idx = np.argmax(raw_q_values)

# x_input.shape = (1, 5, 48, 48, 3) (batch_size, time_steps, pixels_width, pixels_height, pixel_channels)

heatmap = visualize_cam(model_target, layer_idx, [action_idx], seed_img_arr, alpha=0.3, input_data_rnn=x_input)
heatmap_img = Image.fromarray(np.transpose(np.array(heatmap), axes=[1, 0, 2]))
timestamp = time.time()
seed_img = Image.fromarray(np.transpose(np.array(env.getScreenRGB()), axes=[1, 0, 2]))

composite_img = Image.new("RGB", (seed_img.size[0] * 2, seed_img.size[1]))
composite_img.paste(heatmap_img, (0, 0))
composite_img.paste(seed_img, (seed_img.size[0], 0))

This is how output looks like in 3D maze where agent focuses on red doors image

raghakot commented 7 years ago

Nice. Looks like the model is trained using reinforcement learning. It would be really cool to have an example for this in examples/ if your code is not confidential or proprietary.

So whats the difference between model.input and input_data_rnn? From the code, it appears that you are using it to do this.

model_input = input_data_rnn[-1]
heatmap = heatmap[-1]

I dont quite understand what that does. Also, there was an API change. You should rebase. The code no longer tries to overlay heatmap since folks can use this to find heatmap on non-images or video frames as well.

With the new code, the heatmap will have the same shape as x_input and the overlaying part can be done outside.