Open ProGamerGov opened 8 years ago
So this is a far superior batch image downloaded: https://chrome.google.com/webstore/detail/bulk-image-downloader/lamfengpphafgjdgacmmnpakdphmjlji?hl=en
The "fatkun batch download" can't handle more than 300-400 images without freezing. The one I linked in this comment can do over 3000+ images, but be warned that it does not save the images into a folder, and just dumps them into the downloads folder.
Thanks for the iter_size tip.
I have edited the posting below, after noticing that the prototxt I am using in the current training is not for VGG16 but a smaller 5-layer convnet.
I haven't tried training with Caffe for a long time, but decided now to try again. I am using my 2500+ photos of places which I have classified using Places205 model and using the scripts linked in a post above and renumbering the 168 labels found to the range 0...167. I am training a small convnet from scratch, using my own training prototxt, SGD optimizer and base_lr: 0.000005. The losses and accuracies are improving but very slowly, the accuracy after the first 24 hours is 0.1312.
Training a deep net like VGG16 or VGG19 is difficult. I have read that the original VGG16 and VGG19 were not trained in one go, but gradually training less deep versions first.
Often when my attempts fail totally to converge, there is something wrong with the labels. Like this time, in the first attempt, the num_output was wrong (I had forgotten to modify it to match the dataset) and the losses did not start to diminish, on the contrary. Also, it should be obvious that the training data, the images and the labels, should be such that learning is possible, that there are consistent, recognizable features. Thinking about a deep model with randomly set weights, it is a wonder that the learning can get started in the right direction at all.
Here is the result from 4600 iterations on a data set composed of approximately 3200 Deepart.io output images, and 3200 Ostagram output images. The Neural-Style output is 1500, and the model was a fine-tuned VGG16 SOD Finetune Model.
So I tried adding the following code to every except the required ones:
param {
lr_mult: 0 #learning rate of weights
decay_mult: 1
}
param {
lr_mult: 0 #learning rate of bias
decay_mult: 0
}
As per the information I found here: https://github.com/BVLC/caffe/wiki/Fine-Tuning-or-Training-Certain-Layers-Exclusively
But I would always receive an error like "Expected ":" instead of "{"
". I am unsure of how to resolve this issue so that I can experiment with only training a single layer at a time.
3.10.2016 23:49, ProGamerGov kirjoitti:
So I tried adding the following code to every except the required ones:
param { lr_mult: 0 #learning rate of weights decay_mult: 1 } param { lr_mult: 0 #learning rate of bias decay_mult: 0 } As per the information I found here: https://github.com/BVLC/caffe/wiki/Fine-Tuning-or-Training-Certain-Layers-Exclusively
But I would always receive an error like "|Expected ":" instead of "{"|". I am unsure of how to resolve this issue so that I can experiment with only training a single layer at a time.
I have not seen it explained clearly anywhere, but there appear to exist two syntax variants of prototxt. Both work but cannot be mixed. One uses "layers" and the other "layer".
I am on thin ice now, but it could be that the equivalent of your definition would be in the other syntax:
blobs_lr : 0 blobs_lr: 0 weight_decay: 1 weight_decay: 0
That is, these lines only, without the param block. If you are modifying an existing prototxt, you should be able to see which alternative is being used. Stick to the same syntax.
I made a typo. Meant to say "without the param block". Only the four lines, no param block.
Cannot login right now to modify the comment.
@htoyryla
From previous testing, I found this interesting: https://i.imgur.com/XHg8CPA.jpg It appears that after 200 iterations, the edge detection starts being damaged by the fine-tuning. On all my training experiments, I have seen very similar results. Though usually the iterations where edge detection begins to break down are near the half way point, as opposed to so close to the start.
@htoyryla The difference between "layers" and "layers" is that "layers" is the outdated version of the prototxt. You can use upgrade_net_proto_text to update the prototxt file to the newer version.
cd ~
cd caffe
./build/tools/upgrade_net_proto_text vgg16_finetuned_train_val.prototxt vgg16_finetuned_train_val_out.prototxt
./build/tools/upgrade_net_proto_binary VGG16_SOD_finetune.caffemodel VGG16_SOD_finetune_out.caffemodel
I seem to have figured out how to change the output in an neutral manor that only affects the seed value in Neural-Style by fine-tuning the VGG-16 SOD Finetune model. Interestingly enough my data set was composed of art produced by neural networks.
Edit:
On closer inspection, it appears like the differences between the original and the fine-tuned version are in terms of smaller details. I only ran it for 600 iterations as I have to use AWS spot instances for this kind of stuff, but it looks like the newly fine-tuned model version produces more intricate details than the original model.
If I have achieved settings that result in an almost neutral change, then I can now theoretically change single parameters, target layers, etc... to achieve better artistic outputs.
So targeting specific layers seems to produce different output that are not worse than the original model's outputs. Really wish I had the resources to fully flesh this out, as it looks really promising for enhancing Neural-Style's outputs.
I think that by targeting different combinations of the default layers that Neural-Style uses, one can improve the model's ability in specific areas with the proper data set.
This prototxt here has been configured to stop learning on all layers by default: https://gist.github.com/ProGamerGov/1514d74dc6b799389875ce1764c1a12e
I was using the VGG16_SOD_finetune model: https://gist.github.com/jimmie33/509111f8a00a9ece2c3d5dde6a750129
And I ran ./build/tools/upgrade_net_proto_binary VGG16_SOD_finetune.caffemodel VGG16_SOD_finetune_out.caffemodel
to convert the model to the latest version of Caffe.
You can allow learning on your layer of choice by changing the following lines of code on the desired layer:
param {
lr_mult: 0
decay_mult: 1
}
param {
lr_mult: 0
decay_mult: 0
}
To:
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
The learning related values are from this Caffe guide here for training certain layers exclusively: https://github.com/BVLC/caffe/wiki/Fine-Tuning-or-Training-Certain-Layers-Exclusively
Another note is that edge detection abilities of the model do not seem to be positively or negatively impacted by this layer specific training.
I can also provide my two category Deepart.io and Ostagram data set which contains aproximately 3000 images for each of the two categories, if you want.
crowsonkb's style_transfer has an updated Amazon AMI, which has the latest version of Caffe already installed.
It looks like training a specific layer, or the default Neural-Style layers, requires a lot longer training time to notice major differences between the original and fine-tuned model.
Here are the results from some small scale experiments I ran using the newly found neutral training parameters on the upgraded model and protoxt files: https://i.imgur.com/k0jxvtv.png
So, just in case I am making the wrong assumptions, as per the prototxt file and Neural-Style's default layer related settings, the -content_layers
and -style_layers
map to the following prototxt layer names.
Prototxt | Neural-Style |
---|---|
conv1_1 | relu1_1 |
conv2_1 | relu2_1 |
conv3_1 | relu3_1 |
conv4_1 | relu4_1 |
conv4_2 | relu4_2 |
conv5_1 | relu5_1 |
Or is Neural-Style using the part below each "conv" layer which has "relu" instead of "conv"?
Example of the prototxt layout:
layer {
name: "conv1_1"
type: "Convolution"
bottom: "data"
top: "conv1_1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 64
pad: 1
kernel_size: 3
weight_filler {
type: "gaussian"
mean: 0
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu1_1"
type: "ReLU"
bottom: "conv1_1"
top: "conv1_1"
}
The prototxt I was using can be found here: https://gist.github.com/ProGamerGov/1514d74dc6b799389875ce1764c1a12e
I am not fully sure I understand your question. Especially when you say 'is Neural-Style using the part below each "conv" layer which has "relu" instead of "conv"' Below in the sense "below in the prototxt file" or "in a lower layer".
But never mind. ReLU is really nothing more than an add-on function on top of a conv layer which sets all negative values to zero. This is why it is also called a rectifier. So in theory convx_y can output both negative and positive values, but after relux_y all negative values have been replaced by zero.
Furthermore, this discussion https://github.com/jcjohnson/neural-style/issues/93 hints that in an implementation such as Torch, the ReLU layer is actually perfomed in-place, which I read to mean that the ReLU directly modifies the memory containing the output of the convlayer. If this is true then there is actually no difference whether one uses conv or relu layers in neural_style, the ReLU function is there anyway, even if you access the conv layer.
@htoyryla You are correct that ReLU is performed in-place in Torch so after a forward pass it doesn't matter whether you pick a conv layer or its associated ReLU layer; they will both have the same value. However there will be a difference during the backward pass: when you backprop through a ReLU layer, the upstream gradients will be zeroed in the same places the activations were zeroed during the forward pass; if you ask neural-style to work with a conv layer then it will not backprop through the ReLU during the backward pass.
This means that when you ask neural-style to use activations on a conv layer, ReLU gets used during the forward pass but not during the backward pass, so the backward pass will not be correct in this case. You can still get nice style transfer effects even when the gradients are incorrect in this way, but for this reason I'd generally expect better results using ReLU layers.
@jcjohnson, good point, I did not think about the backward pass.
I suspect that image quality affects training accuracy . This research paper seems to show the effects of image quality on training neural network models: "Understanding How Image Quality Affects Deep Neural Networks"
I recently trained a NIN model on a roughly sorted custom data set of about 40,000 faces. There appear to be direct improvements to how the model handles faces in terms of content images. But style images which do not have faces, do not work as well. I think that if one could train the model on artwork, in addition to common content images, it would help the model understand both.
I have sometimes been thinking about using two models, one for style, one for content, both trained with limited material. Don't know if it would work though, and memory usage would certainly be a problem. Yet it could be an interesting exercise.
@htoyryla That idea could be more resource efficient by using two small NIN like models that are trained on one target category each only.
So it turns out that at least for the NIN model, it still has the knowledge required for style transfer, in addition to the newer face related knowledge that I gave it.
The unmodified NIN model is on the right, and the fine tuned NIN model is on the left:
I used a DeepDream project based on Neural-Style to try and determine why things had changed in the modified NIN model. Below are the DeepDream layer activation tests for all 29 layers used by the NIN model:
The original model:
The modified model:
These DeepDream images helped me figure out that by simply changing the -content_layers
and -style_layers
, I could utilize the improved facial feature detection abilities of my fine tuned NIN model.
The NIN model itself that I created, had 15700 iterations during training, and seemed to maintain 86-96% accuracy during the last couple thousand iterations. With around 40k training images, I calculated around 24-25 epochs occurred during the training session? I also stopped the training 11600 iterations, in order to lower the learning rate so that the loss would continue going down. I'm not sure if I was over-fitting the model, but it seemed to have improved abilities on an image that it was not apart of the training data set.
After the NIN experiments, I attempted to fine tuned a VGG-16 model on my rough faces data set. It's a lot slower to fine tune VGG-16 models than it is to fine tune NIN models. From iterations 1000 to 8000, it seems that the model is actually improving on it's ability to recognize facial features:
The output from the non fine tuned SOD_FINETUNE model can be found here: https://i.imgur.com/wWtWysT.png
Obviously for my experiments I used the exact same parameters, seed values, etc... to eliminate any other things that might cause different outputs.
An album with the full versions of the images I posted in this comment can be found here: https://imgur.com/a/njDJ1
Edit:
To clarify, the VGG-16 model that I fine tuned is called the "VGG-16 SOD Finetune" model. The "finetune" in the original model's name is because it was fine tuned for salient object detection from the regular VGG-16 model. I have now fine tuned this previously fine tuned model, with a new data set.
Trying to train a NIN model from scratch with my data set did not work, and only produces blurry style transfer images, and broken DeepDream images. Maybe there are certain classes that help the model learn other classes? Or maybe I just choose bad training parameters?
Edit:
Analyzing the training loss (idk what graphing tool to use), it appears like the NIN model from scratch had the loss decrease quickly, and then stay constant. For the fine-tuned NIN model, the training loss dropped quickly and seems to have very slowly decreased/maybe stayed the same. Though it must have worked better than when I tried to train from scratch, seeing as it does appear to have better facial feature detection abilities.
The fine tuned SOD model has the loss drop continuously over time, which I imagine looks like what one should expect with good training parameters. So I think the results from my fine tuned NIN model are questionable and needs better training parameters, but the VGG-16 SOD model seems to actually be improved in a way that is appropriately reflected in the loss values.
Second Edit:
After some more testing on my fine tuned SOD model, it appears that I may have actually improved the model with very little change to the model's other abilities. It now more accurately deals with faces, and possibly other parts of the human body (upper portion of the body I think).
I wonder if the "roughly sorted" nature of my data set helps the model's new abilities, or weakens the model's new abilities?
The training loss graphs seem to support my results.
The NIN model from scratch is on the left, and the fine tuned NIN model is on the right:
The fine tuned VGG-16 model:
I think using a larger batch size (64 instead of less than 10) compared to earlier experiments, is part of the reason for this recent training success.
I think I might be onto something here as my fine tuned model appears to be better at facial feature preservation:
An album with the full images can be found here: https://imgur.com/a/tArrY
It looks as though my fine tuned model is more accurately detecting the eyes, and mouth of the person in the photo.
The mouth is more "horizontal" in the image produced by my fine tuned model, just like in the original photo, while the unmodified model is curving the mouth in an extreme way.
The chin in my image seems to be more "separated" from the neck and background than the control image. I believe this is from a technique people use to create better looking photographs of themselves via making their chin stand out from their neck more.
The eyes in my fine tuned model are outlined, while the original model does not outline the eyes.
The eyebrows are darker in my image, than in the control image.
The solver.prototxt file and the train_val.prototxt can be found here: https://gist.github.com/ProGamerGov/2bdf7659ee14dac03269a3ec3a7f1fcd
Imagemagick seems to be slow for resizing large data sets of images (especially when using the -resize
option), but using parallel like this makes it faster:
parallel -j 8 convert {} -resize '256x256^' -gravity Center {} ::: *.png
parallel -j 8 mogrify {} -format png {} ::: *.jpg
You can get Parallel via sudo apt-get install parallel
.
Source: https://stackoverflow.com/questions/26168783/imagemagick-convert-and-gnu-parallel-together
I uploaded the Rough Faces model and added a link to download it on the alternative models wiki page: https://github.com/jcjohnson/neural-style/wiki/Using-Other-Neural-Models
Hopefully it can help those seeking better facial preservation with Neural-Style.
For me, the Rough Faces model didn't work well. For a faces specific model, I would expect it to have strong activations for features like head/face shape, hair, eyes, nose, mouth etc. Here, it mainly picked up the merengue desert in the background :)
content image
style image
result
I did test your content image with my fine tuned model, and I think the issue may be that the "rough faces" training data was not very diverse and as a result it performs best with certain images. My example image for testing, was also a part of training data, so that may skew the results (though I did test it on other images that I think were not part of the training data).
I was looking through my old experiments, and I see that I didn't seem to actually share the two successfully fine-tuned models that I had created. The one model in particular (The "Plaster" model) creates a very different output than the non fine-tuned version.
Some experimentation with parameters may be required to achieve satisfactory results as like with all the models I trained and fine-tuned, I would only test them in Neural-Style with certain parameter values.
I'm not sure if the "Low Noise" model is actually different than the non fine-tuned model in a way that that's useful for certain styles like the "Plaster" model is, so it can be removed if it's not useful.
I posted the models on the wiki page here: https://github.com/jcjohnson/neural-style/wiki/Using-Other-Neural-Models
Seeing as both models are from 2016, I am going to test them with a bunch of more "modern" Neural-Style parameters, like setting the TV weight to 0, using the Adam parameters I discovered in addition to L-BFGS, and using multiscale resolution.
I want to change iteration numbers.Where I have to change?
@Nusrat12 You control the maximum number of iterations by setting the max_iter:
value in your solver.prototxt
. You can see an example of it here.
Where should I start if I want to train a model for usage with Neural-Style?
Are Network In Network (NIN) models easier to train than VGG models?
Does anyone know of any guides that cover training a model that is compatible with Neural-Style from start to finish? If not, then what do I need to look for in order to make sure the model I am learning to train is compatible with Neural-Style?
What is the easiest way to train a model for use with neural-style? Are there any AMIs available that will let me start messing around with training right away?