I'm looking at your implementation and I noticed that the prediction heads only have one Conv layer before splitting in the three actual heads, while in the paper, in figure 4, they seem to have two Conv layers there.
Is this on purpose or is it missing a layer?
Hello,
I'm looking at your implementation and I noticed that the prediction heads only have one Conv layer before splitting in the three actual heads, while in the paper, in figure 4, they seem to have two Conv layers there. Is this on purpose or is it missing a layer?