why you guys use a net for classification, is there any theory?

I think that one has started using models trained for classification because they have been available, already trained, and the training for classification has made the conv layers respond to features that are useful also in style transfer.

I have also been thinking that a decoder-encoder network should work as well and be easier to train: no need to label the images, just to use images which are relevant. If the model succeeds to decode the image into latent space and then reconstruct it, then the decoder part must respond quite well to the relevant features in the images.

Then there are the practical issues of training the decoder-encoder. Caffe would be a natural choice for the training framework but unfortunately it does not, as far as I know, support unpooling layers, but there is a caffe derivative that does. Then the network architecture should be such that it (or more specifically the trained decoder part) can be loaded by loadcaffe. I have been thinking of trying this but it has not been my first priority. And then if one starts to play with models with encoder capability, then one might also use the encoder to produce output image directly (as in texture_nets). Anyway, the idea of using an autoencoder to train a model is attractive.

jcjohnson / neural-style

why you guys use a net for classification, is there any theory? #309