jcjohnson / neural-style

Torch implementation of neural style algorithm
MIT License
18.31k stars 2.7k forks source link

Where should I start if I want to train a model for usage with Neural-Style? #292

Open ProGamerGov opened 8 years ago

ProGamerGov commented 8 years ago

Where should I start if I want to train a model for usage with Neural-Style?

Are Network In Network (NIN) models easier to train than VGG models?

Does anyone know of any guides that cover training a model that is compatible with Neural-Style from start to finish? If not, then what do I need to look for in order to make sure the model I am learning to train is compatible with Neural-Style?

What is the easiest way to train a model for use with neural-style? Are there any AMIs available that will let me start messing around with training right away?

htoyryla commented 8 years ago

There are at least two parts to this question:

One has to start from the technical part. Caffe http://caffe.berkeleyvision.org is a good choice to start with. It is not too difficult to install, no coding is needed to use it and it directly produces caffemodel files. To train a model, one needs

With these in place, training using caffe will create a model initialized with random weights (according to what is stated in the prototxt file) and start training it using the dataset.

Training a deep network from scratch can be difficult and time-consuming. One might start with a small model first, with only a limited number of convolutional layers, or one might try finetuning an existing model. Finetuning means taking an existing, already trained model and training it further using a different dataset. Like in this example http://caffe.berkeleyvision.org/gathered/examples/finetune_flickr_style.html .

Either way, one can without much difficulty create models that work with neural-style, in the sense that the model loads, iterations start and even the losses may start diminishing. The visual results are often a disappointment, however. I have done this several times already, using wikiart, my own photo library and a programmatically created dataset of geometrical images. Nothing really useful yet, but learning all the time.

Some more detailed notes: For VGG networks, it looks like that training prototxt files are not available in the web, but I managed to piece together one that works. Training a VGG network from scratch is not really recommended. From what I have heard, the creators of the model couldn't train the deeper models from scratch, but had to train smaller models first and then add layers for a new training round. But maybe a VGG with only 1st and 2nd conv layers levels as a first try. Or a VGG finetuned on one's own dataset.

ProGamerGov commented 8 years ago

I sucessfully trained a model that is similar to NIN but with less layers and produced the following images after training it for 70,000 iterations:

https://imgur.com/a/sYRhV

I used the CIFAR10 data set and this github page along with the supplied scripts in home/ubuntu/caffe/examples/cifar10.

https://gist.github.com/mavenlin/d802a5849de39225bcc6

I am currently wondering if there is a data set of artwork available at the moment that I could use for training?

I found this data set: http://people.bath.ac.uk/hc551/dataset.html but that's it from what I have been able to find thus far for artwork data sets. I was also considering grabbing all the images posted to /r/art/ on Reddit for use in training. Maybe also using my massive collection of styles as well.

htoyryla commented 8 years ago

Your results look familiar to me. They can be interesting as such, but if the model does not respond to the different styles, then it is very limited what it can achieve.

I cannot now locate the example from where I obtained the wikiart materials. It was not a caffe example if I remember correctly. More like someone's python project, from which I got a list of wikiart urls with label data. Not all urls worked, but out of those which did I put together an LMDB. I'll look further if I find something.

htoyryla commented 8 years ago

Here's one of my results:

sh3-i19800-paasikivi-feininger-cl23sl124-cw200sw100_150

Only the colors derive from the style. Changing layers, weights and style image produces a number of variation, but quite limited.

sh3-i12000-paasikivi-kahvila-cl234sl124-cw200sw40000_150

sh3-i12000-paasikivi-feininger-cl234sl124-cw200sw40000_150

Another model I trained produced mainly clouds or blobs of color:

sibir-sh86000g_310

It seems to me that these limitations derive from a too small dataset and too few training iterations. One needs also to consider the contents of the dataset. Even if the training is successful, the model only learns to recognize such features that stand out in the dataset. To work well, it should recognize the features that are essential in both content and style images. My geometrical shapes dataset resulted in clouds of color, then clearly the model failed to recognize essential features in the images.

I have not used CIFAR10, but I assume that the small size of the images might be a handicap. In another thread here, a hypothesis was raised that a model in neural style works best with images of the size of the training images.

Roaming a bit further, I have recently been interested in unsupervised training, using a model which first crunches the image into a vector (such as FC6 output) and then reconstructs the image using deconvolutional and unpooling layers. With this approach, we don't need labels, as the model will learn by comparing the input and output images.

htoyryla commented 8 years ago

The material about finetuning using wikiart can be found here https://computing.ece.vt.edu/~f15ece6504/homework2/ . I see it mainly useful for the image urls and labels, as a basis for making LMDB for caffe. And for neural-style, forget Alexnet, it requires GROUP which is not supported by loadcaffe.

htoyryla commented 8 years ago

For anyone who is interested, here's one of my VGG16 train prototxt files. Some configuration will be needed if you want to use it.

name: "VGG_hplaces_16_layers"
layers {
  top: "data"
  top: "label"
  name: "data"
  type: DATA
  data_param {
    source: "/home/hannu/caffe/hplaces/hplaces_train_lmdb"
    backend: LMDB
    batch_size: 28
  }
  transform_param {
    crop_size: 224
    #mirror: true
    mean_file: "/home/hannu/caffe/hplaces/hplaces_train_mean.binaryproto"
  }
  include: { phase: TRAIN }
}
layers {
  top: "data"
  top: "label"
  name: "data"
  type: DATA
  data_param {

    source: "/home/hannu/caffe/hplaces/hplaces_val_lmdb/"
    backend: LMDB
    batch_size: 10
  }
  transform_param {
    crop_size: 224
    #mirror: false
    mean_file: "/home/hannu/caffe/hplaces/hplaces_val_mean.binaryproto"
  }
  include: { phase: TEST }
}
layers {
  bottom: "data"
  top: "conv1_1"
  name: "conv1_1"
  type: CONVOLUTION
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  convolution_param {
    num_output: 64
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  bottom: "conv1_1"
  top: "conv1_1"
  name: "relu1_1"
  type: RELU
}
layers {
  bottom: "conv1_1"
  top: "conv1_2"
  name: "conv1_2"
  type: CONVOLUTION
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  convolution_param {
    num_output: 64
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  bottom: "conv1_2"
  top: "conv1_2"
  name: "relu1_2"
  type: RELU
}
layers {
  bottom: "conv1_2"
  top: "pool1"
  name: "pool1"
  type: POOLING
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layers {
  bottom: "pool1"
  top: "conv2_1"
  name: "conv2_1"
  type: CONVOLUTION
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  convolution_param {
    num_output: 128
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  bottom: "conv2_1"
  top: "conv2_1"
  name: "relu2_1"
  type: RELU
}
layers {
  bottom: "conv2_1"
  top: "conv2_2"
  name: "conv2_2"
  type: CONVOLUTION
  convolution_param {
    num_output: 128
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  bottom: "conv2_2"
  top: "conv2_2"
  name: "relu2_2"
  type: RELU
}
layers {
  bottom: "conv2_2"
  top: "pool2"
  name: "pool2"
  type: POOLING
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layers {
  bottom: "pool2"
  top: "conv3_1"
  name: "conv3_1"
  type: CONVOLUTION
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  bottom: "conv3_1"
  top: "conv3_1"
  name: "relu3_1"
  type: RELU
}
layers {
  bottom: "conv3_1"
  top: "conv3_2"
  name: "conv3_2"
  type: CONVOLUTION
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  bottom: "conv3_2"
  top: "conv3_2"
  name: "relu3_2"
  type: RELU
}
layers {
  bottom: "conv3_2"
  top: "conv3_3"
  name: "conv3_3"
  type: CONVOLUTION
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  bottom: "conv3_3"
  top: "conv3_3"
  name: "relu3_3"
  type: RELU
}
layers {
  bottom: "conv3_3"
  top: "pool3"
  name: "pool3"
  type: POOLING
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layers {
  bottom: "pool3"
  top: "conv4_1"
  name: "conv4_1"
  type: CONVOLUTION
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  bottom: "conv4_1"
  top: "conv4_1"
  name: "relu4_1"
  type: RELU
}
layers {
  bottom: "conv4_1"
  top: "conv4_2"
  name: "conv4_2"
  type: CONVOLUTION
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  bottom: "conv4_2"
  top: "conv4_2"
  name: "relu4_2"
  type: RELU
}
layers {
  bottom: "conv4_2"
  top: "conv4_3"
  name: "conv4_3"
  type: CONVOLUTION
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  bottom: "conv4_3"
  top: "conv4_3"
  name: "relu4_3"
  type: RELU
}
layers {
  bottom: "conv4_3"
  top: "pool4"
  name: "pool4"
  type: POOLING
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layers {
  bottom: "pool4"
  top: "conv5_1"
  name: "conv5_1"
  type: CONVOLUTION
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  bottom: "conv5_1"
  top: "conv5_1"
  name: "relu5_1"
  type: RELU
}
layers {
  bottom: "conv5_1"
  top: "conv5_2"
  name: "conv5_2"
  type: CONVOLUTION
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  bottom: "conv5_2"
  top: "conv5_2"
  name: "relu5_2"
  type: RELU
}
layers {
  bottom: "conv5_2"
  top: "conv5_3"
  name: "conv5_3"
  type: CONVOLUTION
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  bottom: "conv5_3"
  top: "conv5_3"
  name: "relu5_3"
  type: RELU
}
layers {
  bottom: "conv5_3"
  top: "pool5"
  name: "pool5"
  type: POOLING
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layers {
  name: "fc6"
  type: INNER_PRODUCT
  bottom: "pool5"
  top: "fc6"
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  inner_product_param {
    num_output: 4096
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layers {
  name: "relu6"
  type: RELU
  bottom: "fc6"
  top: "fc6"
}
layers {
  name: "drop6"
  type: DROPOUT
  bottom: "fc6"
  top: "fc6"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layers {
  name: "fc7"
  type: INNER_PRODUCT
  bottom: "fc6"
  top: "fc7"
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  inner_product_param {
    num_output: 4096
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layers {
  name: "relu7"
  type: RELU
  bottom: "fc7"
  top: "fc7"
}
layers {
  name: "drop7"
  type: DROPOUT
  bottom: "fc7"
  top: "fc7"
  dropout_param {
    dropout_ratio: 0.5
  }
}

layers {
  bottom: "fc7"
  top: "fc8_places"
  name: "fc8_places"
  type: INNER_PRODUCT
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  inner_product_param {
    num_output: 205
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.05
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  bottom: "fc8_places"
  top: "prob"
  name: "prob"
  type: SOFTMAX
}
layers {
  name: "accuracy"
  type: ACCURACY
  bottom: "fc8_places"
  bottom: "label"
  top: "accuracy"
  include: { phase: TEST }
}
layers {
  bottom: "fc8_places"
  bottom: "label"
  name: "loss"
  type: SOFTMAX_LOSS
  include: { phase: TRAIN }
}

You need to change the pointers to your dataset and mean files, as well as the batch sizes maybe. You may also want to comment out the prob layer to have cleaner output using training.

3DTOPO commented 8 years ago

If you want a big image set for training, you can download the imagenet database. It is what was used to train the default vgg-19 model.

http://image-net.org

htoyryla commented 8 years ago

Imagenet is certainly a good choice if one wants to train with a general image set and has the computing platform for large scale training. I am planning to get another linux machine dedicated for training but for the moment I cannot tie up my linux computer long enough for other than small experiments (which are good for learning anyway).

ProGamerGov commented 8 years ago

@htoyryla As far as I understand, fine-tuning an already trained model means that you can use a smaller data set.

So I have this data set here with art images:

I just posted a few examples but every category seems to have between 50 and 80 images. People-Art has multiple areas such as Annotations and JPEG images where as Photo-Art does not. Would the wiki-art data set be better or would the People-Art/Photo-Art-50 data set be better for training?


People-Art: 

People-Art\Annotations\Academicism\albert-anker_b-ckligumpen-1866.jpg.xml
People-Art\Annotations\Academicism\albert-joseph-moore_amber.jpg.xml

People-Art\JPEGImages\Academicism\albert-anker_b-ckligumpen-1866.jpg
People-Art\JPEGImages\Academicism\albert-joseph-moore_amber.jpg

People-Art\matlab_funcs\demo_show_anno.m
People-Art\matlab_funcs\VOCevaldet_cai.m

People-Art\test.txt
People-Art\train.txt
People-Art\trainval.txt
People-Art\trainval_only_fg_ims.txt
People-Art\val.txt

Photo-Art-50:

Photo-Art-50\016.boom-box\016a_0001.jpg
Photo-Art-50\101.head-phones\101a_0001.jpg
Photo-Art-50\101.head-phones\101a_0002.jpg

And this previously fine tuned model here that already produces good images in neural-style:

https://gist.github.com/jimmie33/509111f8a00a9ece2c3d5dde6a750129#file-readme-md

How would I step by step, convert this data set into the lmdb files and then how would I exactly use your prototxt to train the already made caffemodel? What train.prototxt and solver.txt files do I need and which ones do I modify? What modifications do I make? I have tried modifying ones that were unclear based on the naming, which file I should to replace it. I tried making a NIN model like the one in Neural-Style using the CIFAR10 data set, but it had the exact same amount of layers that my previous CIFAR10 model had and not the same layers as Neural-Style's NIN model has.

I found this fine tuning command on the Berkeley site:

./build/tools/caffe train -solver models/finetune_flickr_style/solver.prototxt -weights models//bvlc_reference_caffenet.caffemodel -gpu 0

I can easily modify the paths and filenames, but is it the right command to use?


With the wiki-art data set, how exactly do I convert it to the lmdb files that I need? This lmdb part is probably the most confusing part of neural networks for me because I have not found any guides that let me make sense of what exactly I have to do.

And @htoyryla , if possible, could you post the lmdb files and mean files you made from the wiki-art data set for me to download?

ProGamerGov commented 8 years ago

So I tried to fine-tune the VGG16 SOD model on the CIFAR10 data set, and received the following error:

ubuntu@ip-Address:~/caffe$ ./build/tools/caffe train -solver models/vgg16_finetune/solver.prototxt -weights models/vgg16_finetune/VGG16_SOD_finetune.caffemodel -gpu 0

I0726 00:44:44.228581  1820 layer_factory.hpp:74] Creating layer data
I0726 00:44:44.228623  1820 net.cpp:84] Creating Layer data
I0726 00:44:44.228648  1820 net.cpp:338] data -> data
I0726 00:44:44.228682  1820 net.cpp:338] data -> label
I0726 00:44:44.228709  1820 net.cpp:113] Setting up data
I0726 00:44:44.228801  1820 db.cpp:34] Opened lmdb /home/ubuntu/caffe/examples/cifar10/cifar10_train_lmdb
I0726 00:44:44.228873  1820 data_layer.cpp:67] output data size: 28,3,224,224
I0726 00:44:44.228899  1820 data_transformer.cpp:22] Loading mean file from: /home/ubuntu/caffe/data/cifar10/cifar10_train_mean.binaryproto
I0726 00:44:44.234645  1820 net.cpp:120] Top shape: 28 3 224 224 (4214784)
I0726 00:44:44.234693  1820 net.cpp:120] Top shape: 28 (28)
I0726 00:44:44.234710  1820 layer_factory.hpp:74] Creating layer conv1_1
I0726 00:44:44.234742  1820 net.cpp:84] Creating Layer conv1_1
I0726 00:44:44.234756  1820 net.cpp:380] conv1_1 <- data
I0726 00:44:44.234807  1820 net.cpp:338] conv1_1 -> conv1_1
I0726 00:44:44.234838  1820 net.cpp:113] Setting up conv1_1
F0726 00:44:44.241438  1825 data_transformer.cpp:138] Check failed: height <= datum_height (224 vs. 32)
*** Check failure stack trace: ***
    @     0x7f38355c4daa  (unknown)
    @     0x7f38355c4ce4  (unknown)
    @     0x7f38355c46e6  (unknown)
    @     0x7f38355c7687  (unknown)
    @     0x7f38359303c1  caffe::DataTransformer<>::Transform()
    @     0x7f38359eb4f8  caffe::DataLayer<>::InternalThreadEntry()
    @     0x7f382d2e5a4a  (unknown)
    @     0x7f382b73c182  start_thread
    @     0x7f3834baf47d  (unknown)
    @              (nil)  (unknown)
Aborted (core dumped)
ubuntu@ip-Address:~/caffe$

I was also using this solver.prototxt: https://github.com/ruimashita/caffe-train/blob/master/vgg.solver.prototxt and htoyryla's train_val.prototxt

Same error on the normal VGG-16 model:

ubuntu@ip-Address:~/caffe$ ./build/tools/caffe train -solver models/vgg16/solver.prototxt -weights models/vgg16/VGG_ILSVRC_16_layers.caffemodel -gpu 0

layer {
  name: "drop7"
  type: "Dropout"
  bottom: "fc7"
  top: "fc7"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "fc8"
  type: "InnerProduct"
  bottom: "fc7"
  top: "fc8"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  inner_product_param {
    num_output: 1000
  }
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "fc8"
  bottom: "label"
  top: "loss/loss"
}
I0726 00:55:56.447276  1872 layer_factory.hpp:74] Creating layer data
I0726 00:55:56.447317  1872 net.cpp:84] Creating Layer data
I0726 00:55:56.447342  1872 net.cpp:338] data -> data
I0726 00:55:56.447377  1872 net.cpp:338] data -> label
I0726 00:55:56.447404  1872 net.cpp:113] Setting up data
I0726 00:55:56.447495  1872 db.cpp:34] Opened lmdb /home/ubuntu/caffe/examples/cifar10/cifar10_train_lmdb
I0726 00:55:56.447563  1872 data_layer.cpp:67] output data size: 64,3,224,224
I0726 00:55:56.458580  1872 net.cpp:120] Top shape: 64 3 224 224 (9633792)
I0726 00:55:56.458628  1872 net.cpp:120] Top shape: 64 (64)
I0726 00:55:56.458647  1872 layer_factory.hpp:74] Creating layer conv1_1
I0726 00:55:56.458678  1872 net.cpp:84] Creating Layer conv1_1
I0726 00:55:56.458693  1872 net.cpp:380] conv1_1 <- data
I0726 00:55:56.458720  1872 net.cpp:338] conv1_1 -> conv1_1
I0726 00:55:56.458788  1872 net.cpp:113] Setting up conv1_1
F0726 00:55:56.465386  1877 data_transformer.cpp:138] Check failed: height <= datum_height (224 vs. 32)
*** Check failure stack trace: ***
    @     0x7f22574a2daa  (unknown)
    @     0x7f22574a2ce4  (unknown)
    @     0x7f22574a26e6  (unknown)
    @     0x7f22574a5687  (unknown)
    @     0x7f225780e3c1  caffe::DataTransformer<>::Transform()
    @     0x7f22578c94f8  caffe::DataLayer<>::InternalThreadEntry()
    @     0x7f224f1c3a4a  (unknown)
    @     0x7f224d61a182  start_thread
    @     0x7f2256a8d47d  (unknown)
    @              (nil)  (unknown)
Aborted (core dumped)
ubuntu@ip-Address:~/caffe$
ProGamerGov commented 8 years ago

I took the Cubo-Futurism jpg files from the people art data set. I then tried and failed to successfully create the val and train lmdb files.

htoyryla commented 8 years ago

You get the error because my training VGG16 prototxt (and any imagenet based prototxt) expects 256x256 images (then cropped accroding to the prototxt to 224x224) and CIFAR is 32x32.

Check failed: height <= datum_height (224 vs. 32)

I can help with LMDB and prototxt but for a few days I am terribly busy with other things and mostly not even near a computer.

LMDB is created using a script like in caffe/examples/imagenet/create_imagenet.sh, but the script usually needs to be adjusted for paths etc. It can take some time to get used to it and get everything to match, so that the script finds the train.txt and val.txt files as well as the images referred to in them, the image sizes are correct, then it creates two LMDB files. Then you calculate the mean images based on the LMDBs using caffe/examples/imagenet/make_imagenet_mean.sh (or something like that). Then modify the training prototxt to point to your LMDBs and binaryproto files. And make sure the solver.prototxt points to the correct training prototxt.

The train.txt and val.txt for the LMDB creation contain lines like

path_to_an_image label

where label is an integer from 0 .. number_of_categories-1

The handling of paths can be a bit tricky. They are relative to paths set in create_imagenet.sh, but it took me some time to get the paths right.

This is all I can contribute right now. After a few days I will have better time to respond. I am not sure if I have my wikiart LMDB any more, I have other LMDBs but they are usually quite large files.

PS. See also the caffe imagenet example for the LMDB part (never mind if the page talks about leveldb instead of lmdb, it is an alternative option). http://caffe.berkeleyvision.org/gathered/examples/imagenet.html You might also try the example as such, then the paths should match readily.

ProGamerGov commented 8 years ago

So I have my images at:

/home/ubuntu/caffe/data/People-Art/JPEGImages/Academicism
/home/ubuntu/caffe/data/People-Art/JPEGImages/AnalyticalRealism
/home/ubuntu/caffe/data/People-Art/JPEGImages/ArtDeco
/home/ubuntu/caffe/data/People-Art/JPEGImages/ArtNouveau(Modern)
/home/ubuntu/caffe/data/People-Art/JPEGImages/Biedermeier
/home/ubuntu/caffe/data/People-Art/JPEGImages/cartoon
/home/ubuntu/caffe/data/People-Art/JPEGImages/Classicism
/home/ubuntu/caffe/data/People-Art/JPEGImages/Constructivism
/home/ubuntu/caffe/data/People-Art/JPEGImages/Cubism
/home/ubuntu/caffe/data/People-Art/JPEGImages/Cubo-Futurism
/home/ubuntu/caffe/data/People-Art/JPEGImages/Divisionism
/home/ubuntu/caffe/data/People-Art/JPEGImages/EnvironmentalArt
/home/ubuntu/caffe/data/People-Art/JPEGImages/FantasticRealism
/home/ubuntu/caffe/data/People-Art/JPEGImages/FeministArt
/home/ubuntu/caffe/data/People-Art/JPEGImages/HighRenaissance
/home/ubuntu/caffe/data/People-Art/JPEGImages/Impressionism
/home/ubuntu/caffe/data/People-Art/JPEGImages/InternationalGothic
/home/ubuntu/caffe/data/People-Art/JPEGImages/Japonism
/home/ubuntu/caffe/data/People-Art/JPEGImages/LowbrowArt
/home/ubuntu/caffe/data/People-Art/JPEGImages/MagicRealism
/home/ubuntu/caffe/data/People-Art/JPEGImages/MechanisticCubism

etc...

Full list of the folders containing images and ls of cd People-Art: https://gist.github.com/ProGamerGov/4627306588e9d232aa0431c4e26b9687

Each folder of images has a "gt.txt" file. This is what the gt.txt file looks like:

https://gist.github.com/ProGamerGov/2339b815b9e462cb69cd5bb7d156ee9a

Though I believe this may be part of the Cross-Depiction aspect of the data set.

My train.txt and val.txt at:

/home/ubuntu/caffe/data/People-Art/train.txt 
/home/ubuntu/caffe/data/People-Art/val.txt 

train.txt: https://gist.github.com/ProGamerGov/1be5afe398c825cfc3ea119005af71fb val.txt: https://gist.github.com/ProGamerGov/08b121968b28e9f09ddf3e096f424944

My create_imagenet.sh file: https://gist.github.com/ProGamerGov/5f92bdc8e7d83756268f438cf15261eb

located at: /home/ubuntu/caffe/create_imagenet_2.sh

The prototxt of the model I want to fine tune has crop_size: 224, do I need to make the resize value in my create_imagenet_2.sh script the same value?

RESIZE_HEIGHT=256
RESIZE_WIDTH=256

I then run:

ubuntu@ip-Address:~/caffe$ ./create_imagenet_2.sh

Creating train lmdb...
libdc1394 error: Failed to initialize libdc1394
I0727 00:17:01.271579  2440 convert_imageset.cpp:79] Shuffling data
I0727 00:17:01.660755  2440 convert_imageset.cpp:82] A total of 0 images.
I0727 00:17:01.661175  2440 db.cpp:34] Opened lmdb examples/imagenet/people-art_train_lmdb
Creating val lmdb...
libdc1394 error: Failed to initialize libdc1394
I0727 00:17:01.971226  2451 convert_imageset.cpp:79] Shuffling data
I0727 00:17:02.378626  2451 convert_imageset.cpp:82] A total of 0 images.
I0727 00:17:02.379034  2451 db.cpp:34] Opened lmdb examples/imagenet/people-art_val_lmdb
Done.
ubuntu@ip-Address:~/caffe$

This creates two folders:

/home/ubuntu/caffe/examples/imagenet/people-art_train_lmdb
/home/ubuntu/caffe/examples/imagenet/people-art_val_lmdb

Inside both folders are data.mdb and lock.mdb files. They are all 8 KB each in both folders.

Trying to run the script again results in this:

ubuntu@ip-Address:~/caffe$ ./create_imagenet_2.sh
Creating train lmdb...
libdc1394 error: Failed to initialize libdc1394
I0727 00:19:56.326292  2482 convert_imageset.cpp:79] Shuffling data
I0727 00:19:56.722890  2482 convert_imageset.cpp:82] A total of 0 images.
F0727 00:19:56.723007  2482 db.cpp:27] Check failed: mkdir(source.c_str(), 0744) == 0 (-1 vs. 0) mkdir examples/imagenet/people-art_train_lmdbfailed
*** Check failure stack trace: ***
    @     0x7f5be1af4daa  (unknown)
    @     0x7f5be1af4ce4  (unknown)
    @     0x7f5be1af46e6  (unknown)
    @     0x7f5be1af7687  (unknown)
    @     0x7f5be1e54eee  caffe::db::LMDB::Open()
    @           0x403122  main
    @     0x7f5be0d04ec5  (unknown)
    @           0x403e5c  (unknown)
    @              (nil)  (unknown)
Aborted (core dumped)
Creating val lmdb...
libdc1394 error: Failed to initialize libdc1394
I0727 00:19:56.955780  2491 convert_imageset.cpp:79] Shuffling data
I0727 00:19:57.348181  2491 convert_imageset.cpp:82] A total of 0 images.
F0727 00:19:57.348299  2491 db.cpp:27] Check failed: mkdir(source.c_str(), 0744) == 0 (-1 vs. 0) mkdir examples/imagenet/people-art_val_lmdbfailed
*** Check failure stack trace: ***
    @     0x7fcbeb0cedaa  (unknown)
    @     0x7fcbeb0cece4  (unknown)
    @     0x7fcbeb0ce6e6  (unknown)
    @     0x7fcbeb0d1687  (unknown)
    @     0x7fcbeb42eeee  caffe::db::LMDB::Open()
    @           0x403122  main
    @     0x7fcbea2deec5  (unknown)
    @           0x403e5c  (unknown)
    @              (nil)  (unknown)
Aborted (core dumped)
Done.
ubuntu@ip-Address:~/caffe$

This is the readme.txt that came with the data set: https://gist.github.com/ProGamerGov/dfc8652f3db5bc91acdf34ff22c86bd2

I am not exactly sure what is causing my issue, but could it be that the script is not accounting for the structure of my data set?

htoyryla commented 8 years ago

You need to put all the information into train.txt and val.txt. That is where caffe expects to find the urls and the labels. Like this:

/home/hannu/vis/wikiart/VT-F15-ECE6504-HW2/2_finetuning-alexnet-wikiart-style/data/wikiart/images/egon-schiele_seated-girl-1910.jpg 2
/home/hannu/vis/wikiart/VT-F15-ECE6504-HW2/2_finetuning-alexnet-wikiart-style/data/wikiart/images/salvador-dali_still-life-pulpo-y-scorpa.jpg 2
/home/hannu/vis/wikiart/VT-F15-ECE6504-HW2/2_finetuning-alexnet-wikiart-style/data/wikiart/images/orest-kiprensky_young-gardener-1817.jpg 7
/home/hannu/vis/wikiart/VT-F15-ECE6504-HW2/2_finetuning-alexnet-wikiart-style/data/wikiart/images/david-burliuk_in-the-park.jpg 5
/home/hannu/vis/wikiart/VT-F15-ECE6504-HW2/2_finetuning-alexnet-wikiart-style/data/wikiart/images/giovanni-battista-piranesi_vedute-di-roma-30.jpg 4
/home/hannu/vis/wikiart/VT-F15-ECE6504-HW2/2_finetuning-alexnet-wikiart-style/data/wikiart/images/basuki-abdullah_bocah.jpg 6

" A total of 0 images." means that caffe does not find the image files.

Setting the paths in the train.txt versus create_imagenet.sh can be a bit confusing. Unfortunately I don't have the script file for wikiart anymore. But I think what worked for me was to use full path in the train.txt and set the paths in the script as follows:

EXAMPLE=<full path where to place the lmdb> 
DATA=<full path where to find the train.txt and val.txt>
TOOLS=/home/hannu/caffe/build/tools

TRAIN_DATA_ROOT=/  
VAL_DATA_ROOT=/ 

The root paths are set to / because the train.txt contains full paths. It should also work so that one sets the data root path to directory and has relative urls in the txt files, but I remember having some difficulty with that.

I usually write small python scripts to manipulate or create the txt files in the correct format. For my geometrical shapes test I had image files name rect000001.png, ellipse000001.png and so on, then I wrote a python script like this:

from os import listdir
from os.path import isfile, join

mypath = "/home/hannu/work/Geom/data/train/data/"
onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]

for file in onlyfiles:
  output = mypath + file
  if "rect" in file:
    output = output + " 0"
  elif "ellipse" in file:
    output = output + " 1"
  elif "triangle" in file:
    output = output + " 2"
  elif "xtrap" in file:
    output = output + " 3"
  elif "ytrap" in file:
    output = output + " 4"
  elif "ashape" in file:
    output = output + " 5"
  elif "lshape" in file:
    output = output + " 6"
  elif "oshape" in file:
    output = output + " 7"
  elif "ushape" in file:
    output = output + " 8"
  elif "vshape" in file:
    output = output + " 9" 
  print output

and run the output into train.txt. Nothing fancy but it worked.

htoyryla commented 8 years ago

You might have a problem with your caffe installation, too, as you had this error message:

libdc1394 error: Failed to initialize libdc1394

I haven't seen this. As far as I understand, this library is for FireWire connection which should not be needed. Found this on google https://kradnangel.gitbooks.io/caffe-study-guide/content/caffe_errors.html

ProGamerGov commented 8 years ago

I usually write small python scripts to manipulate or create the txt files in the correct format.

https://stackoverflow.com/questions/11003761/notepad-add-to-every-line

I just used this trick to fix my train and val files quickly.

You might have a problem with your caffe installation, too, as you had this error message:

libdc1394 is for video camera usage and not critical to Caffe as far as I understand. I have a few times disabled it and everything still works fine.

htoyryla commented 8 years ago

Perhaps you can manage with notepad but for instance for Wikiart, I think I created the txt files from a downloaded csv file which had all the paths and labels but not in the correct format. Also once I needed to change the label numbering starting from zero instead of one.

htoyryla commented 8 years ago

One more thing if you are planning to finetune. You should change the dimension of fc8 layer (assuming training a VGG) to match the number of categories in your dataset. Also, change the name of fc8 to something else, so that caffe will not try to initialize the weights from the original caffemodel which would fail because of the size mismatch. It is typical to use a name like fc8-10 if you have ten categories.

Like this in the training prototxt:

layers {
  bottom: "fc7"
  top: "fc8_168"
  name: "fc8_168"
  type: INNER_PRODUCT
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  inner_product_param {
    num_output: 168
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  name: "accuracy"
  type: ACCURACY
  bottom: "fc8_168"
  bottom: "label"
  top: "accuracy"
  include: { phase: TEST }
}
layers {
  bottom: "fc8_168"
  bottom: "label"
  name: "loss"
  type: SOFTMAX_LOSS
  include: { phase: TRAIN }
}
ProGamerGov commented 8 years ago

The changes to my create_imagenet_2.sh file, val.txt, train.txt:

https://gist.github.com/ProGamerGov/8267d29262f1bd6570e5918719600695

Still result in the same error.

ProGamerGov commented 8 years ago

@htoyryla Thanks, I'll make the modifications to my train_val.prototxt.

htoyryla commented 8 years ago

Changing the fc8 layer will not solve the LMDB creation. It is another issue which you'll face once you get the LMDB and start finetuning.

htoyryla commented 8 years ago

I still don't see the labels in your train.txt, only the image paths.

ProGamerGov commented 8 years ago

For the labels, do I put it as a different number value for each category?

htoyryla commented 8 years ago

Yes, the labels should be integers from 0 to number_of_categories - 1 as I wrote earlier.

During training, caffe will feed each image into the model and, as there are outputs for each labels, train the model to activate the correct output for each image. Without the labels, there is nothing to guide the training and the model will not learn anything. Also, if all images have the same label, the model simply learns to always output that label regardless of the image, so it will not learn anything about the images. It is only when the labels tell something essential about the images that meaningful learning is possible.

ProGamerGov commented 8 years ago

Ok, I think I got it now. Change the fc8_168 to fc8_43 because I have 43 categories. Then change it to fcpa_43. Even with scripts in Notepad, it will take me a little while to label all the categories. Do I need to do this for both the train and val txt files, or just the one?

htoyryla commented 8 years ago

train.txt and val.txt both have to conform to this format. They also should not include same files, as the val.txt is used to crosscheck that the model really learns to generalize and not simply remember the individual images. I usually first make a train.txt containing all images & labels and then use a script to move every tenth entry to val.txt.

I might first make very short txt files to test if the lmdb creation succeeds. There may still be an issue in the create_imagenet.sh, too. I have sometimes struggled with the paths, everything looked ok but 0 images found, until suddenly after changing something back and forth it worked.

htoyryla commented 8 years ago

I didn't understand your "Then change it to fcpa_43". It should be enough to change to fc8_43, so that the layer name is not fc8 which is in the caffemodel which you will finetune.

ProGamerGov commented 8 years ago

@htoyryla Ok, thanks for the help!

ProGamerGov commented 8 years ago

So I successfully create the lmdb files!

https://gist.github.com/ProGamerGov/d0038f7e3186d057bb7b26398bd764f9

It seems that a few of the images listed in the train.txt and val.txt files, did not exist in the actual data set.

htoyryla commented 8 years ago

It happened to me too, now that you mention. Many (most?) datasets do not contain the actual images, only links for downloading from the original location. Probably the wikiart urls no longer work for some files, those files don't get downloaded. It is like broken links, not unusual in internet.

ProGamerGov commented 8 years ago

Trying to start the fine tuning, seems to be throwing out an error:

ubuntu@ip-Address:~/caffe$ ./build/tools/caffe train -solver models/vgg16_finetune/solver.prototxt -weights models/vgg16_finetune/VGG16_SOD_finetune.caffemodel.caffemodel -gpu 0
libdc1394 error: Failed to initialize libdc1394
I0728 00:34:55.191102  1907 caffe.cpp:113] Use GPU with device ID 0
I0728 00:34:55.575220  1907 caffe.cpp:121] Starting Optimization
I0728 00:34:55.575352  1907 solver.cpp:32] Initializing solver from parameters:
test_iter: 10
test_interval: 100
base_lr: 0.0005
display: 10
max_iter: 450000
lr_policy: "step"
gamma: 0.001
momentum: 0.9
weight_decay: 0.0005
stepsize: 1000
snapshot: 100
snapshot_prefix: "VGG16_SOD_finetune"
solver_mode: CPU
net: "/home/ubuntu/caffe/models/vgg16_finetune/vgg16_train_val.prototxt"
I0728 00:34:55.575443  1907 solver.cpp:70] Creating training net from net file: /home/ubuntu/caffe/models/vgg16_finetune/vgg16_train_val.prototxt
[libprotobuf ERROR google/protobuf/text_format.cc:245] Error parsing text-format caffe.NetParameter: 33:90: String literals cannot cross line boundaries.
[libprotobuf ERROR google/protobuf/text_format.cc:245] Error parsing text-format caffe.NetParameter: 34:2: String literals cannot cross line boundaries.
F0728 00:34:55.576668  1907 upgrade_proto.cpp:928] Check failed: ReadProtoFromTextFile(param_file, param) Failed to parse NetParameter file: /home/ubuntu/caffe/models/vgg16_finetune/vgg16_train_val.prototxt
*** Check failure stack trace: ***
    @     0x7efff1371daa  (unknown)
    @     0x7efff1371ce4  (unknown)
    @     0x7efff13716e6  (unknown)
    @     0x7efff1374687  (unknown)
    @     0x7efff16d0f2e  caffe::ReadNetParamsFromTextFileOrDie()
    @     0x7efff17a5f12  caffe::Solver<>::InitTrainNet()
    @     0x7efff17a6f43  caffe::Solver<>::Init()
    @     0x7efff17a7116  caffe::Solver<>::Solver()
    @           0x40d210  caffe::GetSolver<>()
    @           0x4071e1  train()
    @           0x405781  main
    @     0x7efff0883ec5  (unknown)
    @           0x405d2d  (unknown)
    @              (nil)  (unknown)
Aborted (core dumped)
ubuntu@ip-Address:~/caffe$

My solver.prototxt and train_val.prototxt: https://gist.github.com/ProGamerGov/dd88c6752fda7d6ff9dc22f00e4acd4c

Edit: Line 33's quotation mark was on line 34.

And I had an incorrect path here:

mean_file: "/home/ubuntu/caffe/data/people-art_train_mean.binaryproto" Fixed:

mean_file: "/home/ubuntu/caffe/examples/imagenet/people-art_train_mean.binaryproto"

ProGamerGov commented 8 years ago

So everything was working well until this happened:

Memory required for data: 1152053324 <--- What's this measured in?

I0728 00:55:17.877467  2016 solver.cpp:315]     Test net output #2042: prob = 0.000113739
I0728 00:55:17.877477  2016 solver.cpp:315]     Test net output #2043: prob = 4.07005e-06
I0728 00:55:17.877488  2016 solver.cpp:315]     Test net output #2044: prob = 0.0013953
I0728 00:55:17.877499  2016 solver.cpp:315]     Test net output #2045: prob = 3.87571e-06
I0728 00:55:17.877509  2016 solver.cpp:315]     Test net output #2046: prob = 0.000115883
I0728 00:55:17.877521  2016 solver.cpp:315]     Test net output #2047: prob = 0.000124944
I0728 00:55:17.877531  2016 solver.cpp:315]     Test net output #2048: prob = 6.056e-06
I0728 00:55:17.877542  2016 solver.cpp:315]     Test net output #2049: prob = 0.000191529
I0728 00:55:17.877552  2016 solver.cpp:315]     Test net output #2050: prob = 0.000380109
F0728 00:55:19.191797  2016 syncedmem.cpp:51] Check failed: error == cudaSuccess (2 vs. 0)  out of memory
*** Check failure stack trace: ***
    @     0x7f44cb886daa  (unknown)
    @     0x7f44cb886ce4  (unknown)
    @     0x7f44cb8866e6  (unknown)
    @     0x7f44cb889687  (unknown)
    @     0x7f44cbcb1e1b  caffe::SyncedMemory::mutable_gpu_data()
    @     0x7f44cbbf6323  caffe::Blob<>::mutable_gpu_diff()
    @     0x7f44cbcc9e60  caffe::CuDNNConvolutionLayer<>::Backward_gpu()
    @     0x7f44cbc08f4c  caffe::Net<>::BackwardFromTo()
    @     0x7f44cbc09191  caffe::Net<>::Backward()
    @     0x7f44cbcbeb2d  caffe::Solver<>::Step()
    @     0x7f44cbcbf40f  caffe::Solver<>::Solve()
    @           0x407246  train()
    @           0x405781  main
    @     0x7f44cad98ec5  (unknown)
    @           0x405d2d  (unknown)
    @              (nil)  (unknown)
Aborted (core dumped)
ubuntu@ip-Address:~/caffe$ 

Googling the issue suggests changing the batch_size: values.

layers {
  top: "data"
  top: "label"
  name: "data"
  type: DATA
  data_param {
    source: "/home/ubuntu/caffe/examples/imagenet/people-art_train_lmdb"
    backend: LMDB
    batch_size: 28
  }
  transform_param {
    crop_size: 224
    #mirror: true
    mean_file: "/home/ubuntu/caffe/examples/imagenet/people-art_train_mean.binaryproto"
  }
  include: { phase: TRAIN }
}
layers {
  top: "data"
  top: "label"
  name: "data"
  type: DATA
  data_param {

    source: "/home/ubuntu/caffe/examples/imagenet/people-art_val_lmdb"
    backend: LMDB
    batch_size: 10
  }
  transform_param {
    crop_size: 224
    #mirror: false
    mean_file: "/home/ubuntu/caffe/examples/imagenet/people-art_val_mean.binaryproto"
  }
  include: { phase: TEST }
}

Changing batch_size: 28 to 12 seems to have fixed the issue.

ProGamerGov commented 8 years ago

iter_300 accuracy = 0.25
iter_400 accuracy = 0.31
iter_600 accuracy = 0.21
iter_700 accuracy = 0.22
iter_800 accuracy = 0.16
iter_900 accuracy = 0.23
iter_1300 accuracy = 0.21
iter_1500 accuracy = 0.24

Are these accuracy values a good sign, bad sign, or is it too hard to tell?

The working files and command I used are here: https://gist.github.com/ProGamerGov/068ffa55981e8dac80572ccbd49955ab


Second Try:

In theory when you reduce the batch_size by a factor of X then you should increase the base_lr by a factor of sqrt(X)

Source: https://github.com/BVLC/caffe/issues/430

28/2=14

batch_size: 28 to batch_size: 14

batch_size: 10 to batch_size: 5 base_lr: 0.0005

√(0.0005) = 0.0223607

so

base_lr: 0.0223607

That had an accuracy of 0.

(0.0005)(√(2)) = 0.000707107

Now let's try out the changes:

./build/tools/caffe train -solver models/vgg16_finetune/solver.prototxt -weights models/vgg16_finetune/VGG16_SOD_finetune.caffemodel -gpu 0 2>&1 | tee log.txt


Test 3 had accuracy = 0.07 for iteration 100

Iteration 0 accuracy = 0

batch_size: 14

batch_size: 10

base_lr: 0.000707107

Test 4 was accuracy = 0 for iteration 100

Iteration 0 accuracy = 0

batch_size: 14

batch_size: 5

base_lr: 0.000707107

Test 5 had accuracy = 0.16 at iteration 100

Iteration 0 accuracy = 0

batch_size: 12 

batch_size: 5

base_lr: 0.0005

Test 6 had accuracy = 0.07 at iteration 100.

Iteration 0 accuracy = 0.02

batch_size: 12 

batch_size: 10

base_lr: 0.0005

Test 7 had accuracy = 0.12 at iteration 100

Iteration 0 accuracy = 0

batch_size: 16 

batch_size: 10

base_lr: 0.0005

Test 8 had accuracy = 0.16 at iteration 100

Iteration 0 accuracy = 0

batch_size: 16 

batch_size: 10

base_lr: 0.0005

ProGamerGov commented 8 years ago

Test 9:

batch_size: 12

batch_size: 5

base_lr: 0.0005

Accuracy:

Iterations Accuracy
100 accuracy = 0
200 accuracy = 0.24
300 accuracy = 0.18
400 accuracy = 0.24
500 accuracy = 0.26
600 accuracy = 0.34
700 accuracy = 0.12
800 accuracy = 0.2
900 accuracy = 0.28
1000 accuracy = 0.38
1100 accuracy = 0.28
1200 accuracy = 0.24
1300 accuracy = 0.22
1400 accuracy = 0.26
1500 accuracy = 0.34
1600 accuracy = 0.2
1700 accuracy = 0.26
1800 accuracy = 0.3
1900 accuracy = 0.3
2000 accuracy = 0.24
2100 accuracy = 0.24
2200 accuracy = 0.3
2300 accuracy = 0.32
2400 accuracy = 0.26
2500 accuracy = 0.24
2600 accuracy = 0.26
2700 accuracy = 0.34
2800 accuracy = 0.28
2900 accuracy = 0.24
3000 accuracy = 0.24
3100 accuracy = 0.2
3200 accuracy = 0.36
3300 accuracy = 0.3
3400 accuracy = 0.24
3500 accuracy = 0.2
3600 accuracy = 0.36
3700 accuracy = 0.28
3800 accuracy = 0.26
3900 accuracy = 0.22
4000 accuracy = 0.32
4100 accuracy = 0.3
4200 accuracy = 0.26
4300 accuracy = 0.22
4400 accuracy = 0.3
4500 accuracy = 0.34
4600 accuracy = 0.22
4700 accuracy = 0.26
4800 accuracy = 0.3

Not sure if I am supposed to be getting these results?

htoyryla commented 8 years ago

You seem getting along well.

However, in your training prototxt, you need to change the line

 num_output: 205 

to match the number of your categories.

 num_output: 43

Now you have 200+ unused outputs which mess up the accuracy. Change it and see how it affects the accuracy. Anyway, one should be prepared to run tens of thousands of iterations at least.

ProGamerGov commented 8 years ago

@htoyryla Thanks, I missed that mistake. Hopefully that will help with the accuracy value. Though I may have to play around more with the base_lr and batch_size values because I had previous done so with those accidental extra categories.

When a run ./caffe/tools/extra/parse_log.py mylog.log ./

The "mylog.log.train" file is properly filled with data. But the "mylog.log.test" file only has NumIters,Seconds,TestAccuracy,TestLoss and nothing else. Not sure what is causing this issue.

htoyryla commented 8 years ago

ProGamerGov notifications@github.com kirjoitti 28.7.2016 kello 9.59:

@htoyryla Thanks, I missed that mistake. Hopefully that will help with the accuracy value. Though I may have to play around more with the base_lr and batch_size values because I had previous done so with those accidental extra categories.

Your memory limits the batch size anyway. Use the largest size for training that doesn’t give out of memory.

As to learning rate, I have simply tried decreasing it until the losses start decreasing.

I haven’t used parse_log.py, I have only looked at the output on the screen. If you only see lots of lines with ”prob” values, then you can comment out the prob layer in the training prototxt (I mentioned this earlier). Then you should be able to view the loss printed for each nth training iteration (according to how you set in the solver.prototxt).

ProGamerGov commented 8 years ago

This is the prob layer and I just comment it out like this, correct?

#layers {
#  bottom: "fc8_43"
#  top: "prob"
#  name: "prob"
#  type: SOFTMAX
#}

Your memory limits the batch size anyway. Use the largest size for training that doesn’t give out of memory.

As to learning rate, I have simply tried decreasing it until the losses start decreasing.

Thanks, I was looking for this knowledge but couldn't find it using Google or Github's search function. I'll try to fine tune the values tomorrow when I get the chance. I know some people have the values setup to change after a certain amount of iterations, so how crucial is something like that for fine tuning?

htoyryla commented 8 years ago

The commenting out looks ok to me.

I was for some time baffled by the prob output lines which made difficult to see the loss and accuracy outputs, until I found out because the prob layer was not used as input to any other layers, caffe prints it out.

htoyryla commented 8 years ago

About the finetuning in general. The losses and accuracy tell how well the output matches the labels. As the convolutional layers are already trained, the FC layers learn relatively quickly to give the right outputs. This is the idea of fine-tuning, adapt the fc layers to the new classification task. Therefore it is typical to set learning rate of convolutional layers to 0 (in the prototxt).

For neural-style, the fc layers are not of any interest. In my prototxt, also the convolutional layers adapt to the new data. But as the learning is controlled top down, the upper layers learn faster. Therefore even if the losses get to a good level, one perhaps should continue the training to allow the conv layers to adapt better to the new data.

One can use the snapshot caffemodels for trying them out in neural-style (or in convis). I also once made a lua script which compares the original and trained models for how much the weights have changed (max and avg values). I'll post it if I find it.

htoyryla commented 8 years ago

Here's the code to compare the weights of a trained snapshot with the original https://gist.github.com/htoyryla/bb27efb4d6dedff87810a35ff083f44c

Change the paths to match your models. Note also that the script takes the iteration number of the snapshot as a parameter.

th spred2.lua 10000 

That also explains this line: (be careful when editing this line).

fn = "/home/hannu/train/hplaces290516_iter_" .. arg[1] .. ".caffemodel"

The output gives the layer name, change of max weight, change of avg weight for conv layers, and for fc layers, the layer name, matrix difference, change of max weight and the change of avg weight. This is not meant to be an accurate tool but only to give an indication how each layer is changing.

ProGamerGov commented 8 years ago

Should I be using the "type" settings in the solver.prototxt for fine tuning? If so which of the 6 options should I be using?

1. Stochastic Gradient Descent "SGD"
2. AdaDelta "AdaDelta"
3. Adaptive Gradient "AdaGrad"
4. Adam "Adam"
5. Nesterov’s Accelerated Gradient "Nesterov"
6. RMSprop "RMSProp"

Also, could this message that occurs when I start fine tuning, be of concern?

net: "/home/ubuntu/caffe/models/vgg16_finetune/vgg16_train_val.prototxt"
I0728 23:11:46.251978  6796 solver.cpp:70] Creating training net from net file: /home/ubuntu/caffe/models/vgg16_finetune/vgg16_train_val.prototxt
E0728 23:11:46.253047  6796 upgrade_proto.cpp:618] Attempting to upgrade input file specified using deprecated V1LayerParameter: /home/ubuntu/caffe/models/vgg16_finetune/vgg16_train_val.prototxt
I0728 23:11:46.253442  6796 upgrade_proto.cpp:626] Successfully upgraded file specified using deprecated V1LayerParameter
I0728 23:11:46.253579  6796 net.cpp:257] The NetState phase (0) differed from the phase (1) specified by a rule in layer data
I0728 23:11:46.253631  6796 net.cpp:257] The NetState phase (0) differed from the phase (1) specified by a rule in layer accuracy
I0728 23:11:46.254010  6796 net.cpp:42] Initializing net from parameters: 

Also, is it possible to take two or more pre-trained models and combine/merge them into a single model using Caffe?

ProGamerGov commented 8 years ago

My current test that starts at iteration 800 with a test every 100 iterations:

accuracy = 0.25
accuracy = 0.275
accuracy = 0.15
accuracy = 0.2
accuracy = 0.2875     Iteration 1200
accuracy = 0.1375
accuracy = 0.3125                                Iteration 1400
accuracy = 0.1375
accuracy = 0.175
accuracy = 0.2875     Iteration 1700
accuracy = 0.15
accuracy = 0.2875     Iteration 1900
accuracy = 0.15
accuracy = 0.175
accuracy = 0.2875     Iteration 2200
accuracy = 0.15
accuracy = 0.25
accuracy = 0.1625
accuracy = 0.1875
accuracy = 0.3                                     Iteration 2700
accuracy = 0.1375
accuracy = 0.25
accuracy = 0.175       Iteration 3000
accuracy = 0.1875
accuracy = 0.2875     Iteration 3200
accuracy = 0.125
accuracy = 0.25
accuracy = 0.2
accuracy = 0.175
accuracy = 0.3                                    Iteration 3700

Here's my log file: https://gist.github.com/ProGamerGov/fe1623113a5d87b2da6a0f67b4d060bf

I then stopped it and changed the base_lr to 0.0000005

accuracy = 0.125              Iteration 3700
accuracy = 0.3                                     Iteration 3800
accuracy = 0.1375
accuracy = 0.2

Farther tweaking of the values and starting at iteration 3700:

accuracy = 0.125          Iteration 3700
accuracy = 0.3               Iteration 3800
accuracy = 0.1375
accuracy = 0.2          Iteration 4000
accuracy = 0.275              Iteration 4100
accuracy = 0.1375
accuracy = 0.3                                  Iteration 4300
accuracy = 0.1375
accuracy = 0.175
accuracy = 0.2875      Iteration 4600
accuracy = 0.15
accuracy = 0.2875          Iteration 4800
accuracy = 0.15
accuracy = 0.175       Iteration 5000
accuracy = 0.2875               Iteration 5100
accuracy = 0.15
accuracy = 0.25
accuracy = 0.1625
accuracy = 0.1875
accuracy = 0.3                                 Iteration 5600
accuracy = 0.1375
accuracy = 0.25
accuracy = 0.175
accuracy = 0.1875
accuracy = 0.2875              Iteration 6100
htoyryla commented 8 years ago

Usually I have simply used SGD which is the default. Recently, one dataset would not start learning at all, then I tried AdaDelta and it worked.

type: "AdaDelta" delta: 1e-6

htoyryla commented 8 years ago

I would rather be interested in the losses first. Loss is measured for the training set. Are they decreasing? If they are decreasing then the network is learning, but if the accuracy is not increasing, it is not learning to generalize (this is called "overfitting").

If the losses are not going down then something else is wrong. I had such a case in my first attempts. I still don't understand why that happened; increasing num_output by one helped, but it does not make sense. Perhaps there was something wrong with the labels to begin with.

Your learning rate looks pretty low already. I've never used so low values.

ProGamerGov commented 8 years ago

Here's the log file: https://gist.github.com/ProGamerGov/29219f98178a91ee3ddf039728db9bb3

num_output Increasing it by one means there is a new category composed of nothing, correct?

Edit: Train.txt and Val.txt with labels: https://gist.github.com/ProGamerGov/6978038a0b40795289cafb554d9311af

htoyryla commented 8 years ago

Your log shows every iteration starting from 3700. One cannot really see a trend looking a such a small sample. What counts is the big picture, something like the loss at every 100th iteration starting from zero.

ProGamerGov commented 8 years ago

This is from the model's research paper:

CNN feature. We use Caffe [28] for fine-tuning the CNN model pre-trained on ImageNet [44]. Images are resized to 256 × 256 regardless of of their original aspect ratios. The top-left, top-right, bottom-left and bottom-right 227×227 crops of a image are used to augment the training data. We use Caffe’s default setting for training the CNN model of [30], but reduce the starting learning rate to 0.001 as in [22]. We stop tuning after around 30 epochs, as the training loss no longer decreases.

The model I am fine tuning is from here: http://cs-people.bu.edu/jmzhang/sos.html

Specific the PDF file of the paper can be found here: http://cs-people.bu.edu/jmzhang/SOS/SOS_preprint.pdf

I resized the images I had to 224x224. Could this be the issue?

# Set RESIZE=true to resize the images to 256x256. Leave as false if images have
# already been resized using another tool.
RESIZE=true
if $RESIZE; then
  RESIZE_HEIGHT=224
  RESIZE_WIDTH=224
else
  RESIZE_HEIGHT=0
  RESIZE_WIDTH=0
fi
htoyryla commented 8 years ago

I think the LMDB should be made with 256x256 images. Cropping is then done by caffe as specified in the prototxt. You could try to change the crop to 227 in the prototxt.

If you have 224x224 images in the LMDB you might have to recreate the db.

Is it at all a VGG16 you are trying to fine-tune?

ProGamerGov commented 8 years ago

Yes, it's CNN Object Proposal Models for Salient Object Detection which is a VGG16 model.

https://github.com/BVLC/caffe/wiki/Model-Zoo

VGG16: This model is used in the paper. GoogleNet: This model is smaller, faster and slightly better than the VGG16 model.


I think the LMDB should be made with 256x256 images. Cropping is then done by caffe as specified in the prototxt. You could try to change the crop to 227 in the prototxt.

I ran the LMDB code for 224 not 256. I changed the script manually.