Investigate other architectures

knkski commented 6 years ago

Right now we're using a variant of VGGNet, which is giving decent results. However, we should investigate alternatives such as AlexNet. We should also investigate how well an actual version of VGGNet works, although this is blocked by #1, due to GPU memory usage.

ykq1004 commented 6 years ago

https://cambridgespark.com/content/tutorials/neural-networks-tuning-techniques/index.html

This post mentioned some about what were talking about last time, using He_nomral kernel initializer with relu activation, data augmentation, . Their model training on MNIST (at end of the post) achieved 99.47% accuracy on the testing data. Maybe something we could try?

ykq1004 commented 6 years ago

Found a architecture, SimpleNet: https://github.com/Coderx7/SimpleNet

Their benchmarks show that it preform pretty well, even better than many complex architectures across different image recognition dataset (including MNIST), while it uses fewer parameters.

The corresponding paper, https://arxiv.org/pdf/1608.06037.pdf, introduces their design in detail, also including some tips for fine tuning CNN. Good to read if you guys are interested.

Some interesting things stand out to me:

Comparing to what we have now, their CNN is still quite large, including 13 layers + classification layer. We may need some work around it to reduce it a bit.... or spend more time on training it....
they do zeropading (1,1) to each convolutional layer, which I don't quite understand yet.
they use a kernel of (1,1) instead of (3,3) in the 11th, 12th layers. While the (3,3) kernel helps preserve local correlation, the (1,1) kernel will be good for detecting the detail, so they implement near the end of the CNN.
They do batch normalization before activation (relu in their), which I think we could corporate in the future even if we decide not to use their architecture.

Since they only offer a Caffe version, I "translate" it into Keras: https://github.com/knkski/atai/blob/master/train_SimpleNet.py However, I have not tested yet, if any of you guys are able to run it (also debugging...) would be great! Or we could just pick some pieces and transplant into our model.

Thank you, Yekun

knkski commented 6 years ago

I can answer the zero padding question. Basically, each layer downsamples the image (particularly maxpooling). Since we don't have very large input images, The images can quickly get downsampled to a 0x0 pixel image, which isn't useful. zero padding helps prevent that.

Unfortunately, it looks like a naive implementation of simplenet doesn't perform as well as vggnet:

It's not far off, though. I'll see if I can tweak the parameters and make it perform any better

knkski / atai

Investigate other architectures #4