As described in #01, the Concatenation operation in InputTransition has not been fix. Note that this could cause confusion in training, as the data should be of size() [Batchsize, Channel, Xsize, Ysize, Zsize] and the output to softmax should be of size() [Batchsize, Classnum, Xsize, Ysize, Zsize].
But as the broadcasting between [Batchsize16, XChannel, Xsize, Ysize, Zsize], [Batchsize, XChannel16, Xsize, Ysize, Zsize] bring [Batchsize16, XChannel16, Xsize, Ysize, Zsize], all the following layers would have 16 times more batchsize.
Mathematically this could be offset by running a lot of epochs, but could also make device suffers from memory issue. And each batch is equivalent to 16 un-shuffled batch.
As described in #01, the Concatenation operation in InputTransition has not been fix. Note that this could cause confusion in training, as the data should be of size() [Batchsize, Channel, Xsize, Ysize, Zsize] and the output to softmax should be of size() [Batchsize, Classnum, Xsize, Ysize, Zsize].
But as the broadcasting between [Batchsize16, XChannel, Xsize, Ysize, Zsize], [Batchsize, XChannel16, Xsize, Ysize, Zsize] bring [Batchsize16, XChannel16, Xsize, Ysize, Zsize], all the following layers would have 16 times more batchsize.
Mathematically this could be offset by running a lot of epochs, but could also make device suffers from memory issue. And each batch is equivalent to 16 un-shuffled batch.