I am working on your VASNet paper as part of a Deep Learning seminar. Since a few days I am trying to reproduce the Confusion Matrix of Attention Weights for TVSum video 7 from split 2, which is exactly the matrix you have shown in the paper.
Unfortunately I get a completely different figure, I'm running out of ideas. Here is my code:
self_attention = SelfAttention() # your SelfAttention(nn.Module)
self_attention.load_state_dict(torch.load(SELFATT_MODEL_FILE)) # SelfAttention model from split 2
with h5py.File(TVSUM_DATASET_FILE, 'r') as f:
video_7 = f['video_7']
features = video_7['features'][...]
features = torch.from_numpy(features)
y, weights = self_attention(features)
weights = weights.detach().cpu().numpy()
# Values were normalized to range 0-1 across the matrix.
weights = preprocessing.minmax_scale(weights, feature_range=(0, 1), axis=0, copy=True)
fig, ax = plt.subplots()
ax.xaxis.tick_top()
heatmap = sn.heatmap(
weights_df,
xticklabels=50,
yticklabels=50,
cmap="YlGnBu")
plt.show()
And this is what the Confusion Matrix looks like:
It definitely loads the correct model for tvsum split 2 and the Attention Weights produced are definitely the same too, I checked it against your VASNet.
Does anyone have any idea why this looks like this?
Hi,
I thought I would try my luck here...
I am working on your VASNet paper as part of a Deep Learning seminar. Since a few days I am trying to reproduce the Confusion Matrix of Attention Weights for TVSum video 7 from split 2, which is exactly the matrix you have shown in the paper.
Unfortunately I get a completely different figure, I'm running out of ideas. Here is my code:
And this is what the Confusion Matrix looks like:
It definitely loads the correct model for tvsum split 2 and the Attention Weights produced are definitely the same too, I checked it against your VASNet.
Does anyone have any idea why this looks like this?