Open qq769852576 opened 2 years ago
Hi,
Thanks for your interest in our work.
The shuffle code is simple mixing of the channels after concatenation (more info in #5).
As described in the paper, following are the changes in the model architecture:
"First, we replace the Downsampling blocks at depths 1 and 2 with RepVGGSSE blocks. To reduce the number of parameters in the last layer, we replace the last Downsampling block, which has a large width, with a narrower 1×1 convolution layer. Also, we reduce the number of parameters by removing one block from each stream and adding a block at depth 3."
"For CIFAR10 and CIFAR100, we increase the width of the network while keeping the resolution at 32 and the number of streams at 3"
For training on CIFAR, as we described in the paper, we adopt the following training scheme:
"We adopt a standard data augmentation scheme (mirroring/shifting) that is widely used for these two datasets (He et al., 2016a; Zagoruyko& Komodakis, 2016; Huang et al., 2017). We train for 400 epochs with a batch size of 128. The initial learning rate is 0.1 and is decreased by a factor of 5 at 30%, 60%, and 80% of the epochs as in (Zagoruyko & Komodakis, 2016). Similar to prior works (Zagoruyko & Komodakis, 2016; Huang et al., 2016), we use a weight decay of 0.0003 and set dropout in the convolution layer at 0.2 and dropout in the final fully-connected layer at 0.2 for all our networks on both datasets. We train each network on 4 GPUs (a batch size of 32 per GPU) and report the final test set accuracy."
Please let us know if there is any specific question you have.
Thanks, Ankit
@imankgoyal Would the code for CIFAR10/100 be released in the near future? Thanks!