imankgoyal / NonDeepNetworks

Official Code for "Non-deep Networks"
BSD 3-Clause "New" or "Revised" License
586 stars 42 forks source link

fusion module, accuracy about cifar100 #11

Open qq769852576 opened 2 years ago

qq769852576 commented 2 years ago
  1. what is your shuffle code in your fusion module?
  2. what is your model architecture in cifar-100? I just changed front two downsample modules based on the ParNet for Imagenet in the paper. But the accuracy is lower. And How do you set the LR, MILESTONES and NUM_EPOCH to meet high accuracy?
imankgoyal commented 2 years ago

Hi,

Thanks for your interest in our work.

  1. The shuffle code is simple mixing of the channels after concatenation (more info in #5).

As described in the paper, following are the changes in the model architecture:

For training on CIFAR, as we described in the paper, we adopt the following training scheme:

"We adopt a standard data augmentation scheme (mirroring/shifting) that is widely used for these two datasets (He et al., 2016a; Zagoruyko& Komodakis, 2016; Huang et al., 2017). We train for 400 epochs with a batch size of 128. The initial learning rate is 0.1 and is decreased by a factor of 5 at 30%, 60%, and 80% of the epochs as in (Zagoruyko & Komodakis, 2016). Similar to prior works (Zagoruyko & Komodakis, 2016; Huang et al., 2016), we use a weight decay of 0.0003 and set dropout in the convolution layer at 0.2 and dropout in the final fully-connected layer at 0.2 for all our networks on both datasets. We train each network on 4 GPUs (a batch size of 32 per GPU) and report the final test set accuracy."

Please let us know if there is any specific question you have.

Thanks, Ankit

yaodongyu commented 2 years ago

@imankgoyal Would the code for CIFAR10/100 be released in the near future? Thanks!