In appendix B, under the paragraph about SVHN->MNIST, it is written:
We also found spatial context information was useful. For each input image, we created a 5-channel variant where the first three channels were the original RGB images and the last two channels were the normalized x and y coordinates.
It sounds like this paragraph is saying that the MNIST image, which is originally in grayscale, gets converted to RGB so that when we add the spatial features, the total number of channels for the MNIST image is 5. But in the code, the channel of the MNIST image after adding the spatial feature is only 3.
In appendix B, under the paragraph about SVHN->MNIST, it is written:
It sounds like this paragraph is saying that the MNIST image, which is originally in grayscale, gets converted to RGB so that when we add the spatial features, the total number of channels for the MNIST image is 5. But in the code, the channel of the MNIST image after adding the spatial feature is only 3.
Did I misread the paragraph?