Open mrgloom opened 6 years ago
Hi @mrgloom, please refer to Section 3, paragraph 3
where it is mentioned that the coordinates are normalized to the [-1, +1] range before the conv operation is performed. I believe doing so helps the network work with immensely large images of the order of 1024 x 1024
without the threat of activations' explosion (in turn gradients explosion). I hope this helps. Please let me know if you have any further queries.
Cheers! @akanimax
Hi @akanimax, for xx_channel you create a matrix where columns range from 0 to y_dim and that is repeated in every row. Then you divide by _xdim to convert to 0-1 range before multiplying with 2 and subtracting 1 to shift to [-1,1]. If its not the same y_dim (instead of xdim mentioned above) that you're dividing by it wouldn't be 0-1. The paper discusses only square images, so this wouldn't be a problem. Am i missing something? TIA for any help
@DpkApt, I don't understand your question completely. here they do divide by the appropriate dimensions so that the code works for arbitrary image dimensions. Please let me know if you still have any further questions. The specific case I used was just as an example, in practice all you care about is to have the coordinates in the range [-1, 1].
Cheers :beers:! @akanimax
Ah you're right. I was looking at the code provided in the paper and in the official repo. The difference with the code you shared was xx_channel is formed with range(y_dim) in the official code and range(xdim) for yours. That was actually my point.
As I can see in numpy example it appends maps with values in range [-1,1], but as I understand from paper they suggest
i,j
int coordinates of pixels.