jcjohnson / neural-style

Torch implementation of neural style algorithm
MIT License
18.31k stars 2.7k forks source link

Where should I start if I want to train a model for usage with Neural-Style? #292

Open ProGamerGov opened 8 years ago

ProGamerGov commented 8 years ago

Where should I start if I want to train a model for usage with Neural-Style?

Are Network In Network (NIN) models easier to train than VGG models?

Does anyone know of any guides that cover training a model that is compatible with Neural-Style from start to finish? If not, then what do I need to look for in order to make sure the model I am learning to train is compatible with Neural-Style?

What is the easiest way to train a model for use with neural-style? Are there any AMIs available that will let me start messing around with training right away?

ProGamerGov commented 8 years ago

Though the 256 LMDB results in 9] Check failed: datum_height == data_mean_.height() (256 vs. 224) This is because your train_val.prototxt had crop_size: 224. But based on the research paper, it looks like the model was made with crop_size: 256.

htoyryla commented 8 years ago

The idea is that the training data is 256x256 in the LMDB, and caffe then crops the images during training to the size specified in the prototxt.

There is some confusion here now. LMDB creation does not look into any prototxt, so I cannot understand how the LMDB creation could fail because of the crop in prototxt (which is done during the training).

Try to recreate the LMDB with 256x256 images.

ProGamerGov commented 8 years ago

There is some confusion here now. LMDB creation does not look into any prototxt, so I cannot understand how the LMDB creation could fail because of the crop in prototxt (which is done during the training).

Sorry, my bad, I did not realize the train_val crop and the IMDB crop were separate.

htoyryla commented 8 years ago

" But based on the research paper, it looks like the model was made with crop_size: 256."

The research paper clearly says that images were 256x256 and crops are 227x227. Resize for the LMDB and crop while training are separate operations. Also, resize changes the whole image to new dimensions, crop cuts out a part of the image.

ProGamerGov commented 8 years ago

Maybe I just need to play around with the solver.prototxt some more to find the values that will let me rise above 31% accuracy. Or let it run for a lot more iterations to understand the overall trend of accuracy results and loss values.

htoyryla commented 8 years ago

I would set the solver to print at every 100 intervals, start from the beginning and see if the losses are decreasing.

If they are, then start looking at the accuracy and if needed, tweak the learning rate. You might also try AdaDelta which worked for me.

If everything proceeds nicely up to a point but not beyond, it may depend on many things. Successful training is not easy. On the other hand, finetuning (as opposed to training from scratch) should not be so difficult either. It the improvement stalls, it may be due to deficiencies in the training data. Like not enough material for each label. Or the material is simply difficult to learn (such as images which could belong to multiple categories).

One thing to remember is that even if the training is not very successful, one can anyway always try how the model works in neural-style.

ProGamerGov commented 8 years ago

One thing to remember is that even if the training is not very successful, one can anyway always try how the model works in neural-style.

Just trying iteration 5600 at the moment, and it appears to have pretty visibly changed compared to my control test.

Edit, Iteration 200 in Neural-Style comparison between my fine tuning model and the control model I am trying to fine tune: https://imgur.com/a/Ul9Ho

Testing shows it's better at Cubism style images than the original model.

Here is the comparison with multiple other models: https://imgur.com/a/FoidP

Control vs Fine-Tuned iter 5600 on three images: https://imgur.com/a/DWL77

htoyryla commented 8 years ago

"it's better at Cubism style"

This gives me an idea about a modification to neural-style when one has a model that outputs style probability. I have experimented using also FC layers in a modified neural-style (see http://liipetti.net/erratic/2016/03/28/controlling-image-content-with-fc-layers/ and the sequels). If the model would output the style like cubism from FC8_x, one might use it as an additional factor to steer the image to a particular style. One would, in addition to content and style images select the style category among the possible values at FC8_x. Or several style categories with different weights (because FC8_x can be made to output probabilities for each category as I describe in http://liipetti.net/erratic/2016/03/31/i-have-seen-a-neural-mirage/). The code I used in my experiments is already quite close. Especially the experiments described in http://liipetti.net/erratic/2016/04/20/getting-the-space-back/ .

ProGamerGov commented 8 years ago

I added these two layers into the train_val.prototxt to help understand how well training is going.

layers {
  name: "accuracy/top1"
  type: ACCURACY
  bottom: "fc8_43"
  bottom: "label"
  top: "accuracy@1"
  include: { phase: TEST }
  accuracy_param {
    top_k: 1
  }
}
layers {
  name: "accuracy/top5"
  type: ACCURACY
  bottom: "fc8_43"
  bottom: "label"
  top: "accuracy@5"
  include: { phase: TEST }
  accuracy_param {
    top_k: 5
  }
ProGamerGov commented 8 years ago

After no success in breaking past the 30-31% level of accuracy, I reinstalled everything on a fresh Ubuntu 16.04 with Cuda 8.0RC and Cudnn v5.

ProGamerGov commented 8 years ago

I recieved at error at iteration 8900: https://gist.github.com/ProGamerGov/4ac8b8ece45fd5a1a873636cdc673386


I0731 07:23:18.979414 25048 solver.cpp:454] Snapshotting to binary proto file examples/imagenet/VGG16_SOD_finetune_from_scratch_iter_8900.caffemodel
I0731 07:23:22.974056 25048 sgd_solver.cpp:273] Snapshotting solver state to binary proto file examples/imagenet/VGG16_SOD_finetune_from_scratch_iter_8900.solverstate
F0731 07:23:24.131312 25048 io.cpp:69] Check failed: proto.SerializeToOstream(&output) 
*** Check failure stack trace: ***
    @     0x7f59c234c5cd  google::LogMessage::Fail()
    @     0x7f59c234e433  google::LogMessage::SendToLog()
    @     0x7f59c234c15b  google::LogMessage::Flush()
    @     0x7f59c234ee1e  google::LogMessageFatal::~LogMessageFatal()
    @     0x7f59c2b0f295  caffe::WriteProtoToBinaryFile()
    @     0x7f59c2ae7947  caffe::SGDSolver<>::SnapshotSolverStateToBinaryProto()
    @     0x7f59c2acd534  caffe::Solver<>::Snapshot()
    @     0x7f59c2ace61e  caffe::Solver<>::Step()
    @     0x7f59c2acef49  caffe::Solver<>::Solve()
    @           0x40bd89  train()

Trying to run it again from iteration 8900 or up to iteration 8900 from an earlier snapshot, gives me this: https://gist.github.com/ProGamerGov/e8b0c5507323609e8a252bbba5f68d58

I0731 08:15:54.750869  3529 caffe.cpp:241] Resuming from examples/imagenet/VGG16_SOD_finetune_from_scratch_iter_8900.solverstate
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:537] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 537750305
F0731 08:16:03.274936  3529 sgd_solver.cpp:316] Check failed: state.history_size() == history_.size() (29 vs. 32) Incorrect length of history blobs.
*** Check failure stack trace: ***
    @     0x7fc0b51875cd  google::LogMessage::Fail()
    @     0x7fc0b5189433  google::LogMessage::SendToLog()
    @     0x7fc0b518715b  google::LogMessage::Flush()
    @     0x7fc0b5189e1e  google::LogMessageFatal::~LogMessageFatal()
    @     0x7fc0b5922f7a  caffe::SGDSolver<>::RestoreSolverStateFromBinaryProto()
    @     0x7fc0b5903127  caffe::Solver<>::Restore()
    @           0x40badf  train()
    @           0x4077c8  main
    @     0x7fc0b391e830  __libc_start_main
    @           0x408099  _start
    @              (nil)  (unknown)

However, I could start from the iteration 8900 caffemodel without an error.

htoyryla commented 8 years ago

How’s your disk space? If one takes snapshots often they can fill a disk suprisingly fast. Happened to me once with a 240 GB SSD.

ProGamerGov notifications@github.com kirjoitti 31.7.2016 kello 11.17:

I recieved at error at iteration 8900: https://gist.github.com/ProGamerGov/4ac8b8ece45fd5a1a873636cdc673386

I0731 07:23:18.979414 25048 solver.cpp:454] Snapshotting to binary proto file examples/imagenet/VGG16_SOD_finetune_from_scratch_iter_8900.caffemodel I0731 07:23:22.974056 25048 sgd_solver.cpp:273] Snapshotting solver state to binary proto file examples/imagenet/VGG16_SOD_finetune_from_scratch_iter_8900.solverstate F0731 07:23:24.131312 25048 io.cpp:69] Check failed: proto.SerializeToOstream(&output) * Check failure stack trace: * @ 0x7f59c234c5cd google::LogMessage::Fail() @ 0x7f59c234e433 google::LogMessage::SendToLog() @ 0x7f59c234c15b google::LogMessage::Flush() @ 0x7f59c234ee1e google::LogMessageFatal::~LogMessageFatal() @ 0x7f59c2b0f295 caffe::WriteProtoToBinaryFile() @ 0x7f59c2ae7947 caffe::SGDSolver<>::SnapshotSolverStateToBinaryProto() @ 0x7f59c2acd534 caffe::Solver<>::Snapshot() @ 0x7f59c2ace61e caffe::Solver<>::Step() @ 0x7f59c2acef49 caffe::Solver<>::Solve() @ 0x40bd89 train() Trying to run it again from iteration 8900 or up to iteration 8900 from an earlier snapshot, gives me this: https://gist.github.com/ProGamerGov/e8b0c5507323609e8a252bbba5f68d58

I0731 08:15:54.750869 3529 caffe.cpp:241] Resuming from examples/imagenet/VGG16_SOD_finetune_from_scratch_iter_8900.solverstate [libprotobuf WARNING google/protobuf/io/coded_stream.cc:537] Reading dangerously large protocol message. If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h. [libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 537750305 F0731 08:16:03.274936 3529 sgd_solver.cpp:316] Check failed: state.historysize() == history.size() (29 vs. 32) Incorrect length of history blobs. * Check failure stack trace: * @ 0x7fc0b51875cd google::LogMessage::Fail() @ 0x7fc0b5189433 google::LogMessage::SendToLog() @ 0x7fc0b518715b google::LogMessage::Flush() @ 0x7fc0b5189e1e google::LogMessageFatal::~LogMessageFatal() @ 0x7fc0b5922f7a caffe::SGDSolver<>::RestoreSolverStateFromBinaryProto() @ 0x7fc0b5903127 caffe::Solver<>::Restore() @ 0x40badf train() @ 0x4077c8 main @ 0x7fc0b391e830 __libc_start_main @ 0x408099 _start @ (nil) (unknown) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

ProGamerGov commented 8 years ago

How’s your disk space? If one takes snapshots often they can fill a disk suprisingly fast. Happened to me once with a 240 GB SSD.

The space was full and I suspect that cause the initial stopping and error. However I can't seem to start it again from the snapshot. Though something like a reboot might fix that, but it's way too early in the morning already, so I should get some sleep.

htoyryla commented 8 years ago

If the disk got full while saving the snapshot, then the snapshot is likely to be corrupted. Try an earlier one.

Hannu

ProGamerGov notifications@github.com kirjoitti 31.7.2016 kello 11.38:

How’s your disk space? If one takes snapshots often they can fill a disk suprisingly fast. Happened to me once with a 240 GB SSD.

The space was full and I suspect that cause the initial stopping and error. However I can't seem to start it again from the snapshot. Though something like a reboot might fix that, but it's way too early in the morning already, so I should get some sleep.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

ProGamerGov commented 8 years ago

If the disk got full while saving the snapshot, then the snapshot is likely to be corrupted. Try an earlier one.

I'll definitely check out that possibility, training was going really well before it happened. I had almost hit my first epoch.

ProGamerGov commented 8 years ago

Here are the test results for iteration 11500:

I0801 00:14:02.203501  3971 caffe.cpp:308] Batch 49, accuracy@5 = 0.166667
I0801 00:14:02.203507  3971 caffe.cpp:313] Loss: 0
I0801 00:14:02.203533  3971 caffe.cpp:325] accuracy = 0.103333
I0801 00:14:02.203552  3971 caffe.cpp:325] accuracy@1 = 0.103333
I0801 00:14:02.203564  3971 caffe.cpp:325] accuracy@5 = 0.343333

It does not work very well in Neural-Style.

ProGamerGov commented 8 years ago

Whenever I add "type: "AdaDelta" to my solver.prototxt file, it gives me the following error:

In my solver.prototxt, I put: type: "AdaDelta"

ubuntu@ip-Address:~/caffe$ ./build/tools/caffe train -solver models/vgg16_finetune/solver.prototxt -weights models/vgg16_finetune/VGG16_SOD_finetune.caffemodel -gpu 0 2>&1 | tee ~/mylog.log
libdc1394 error: Failed to initialize libdc1394
[libprotobuf ERROR google/protobuf/text_format.cc:245] Error parsing text-format caffe.SolverParameter: 23:5: Message type "caffe.SolverParameter" has no field named "type".
F0801 02:02:46.553591  4373 io.hpp:54] Check failed: ReadProtoFromTextFile(filename, proto)
*** Check failure stack trace: ***
    @     0x7f86d6c24daa  (unknown)
    @     0x7f86d6c24ce4  (unknown)
    @     0x7f86d6c246e6  (unknown)
    @     0x7f86d6c27687  (unknown)
    @           0x407591  train()
    @           0x405781  main
    @     0x7f86d6136ec5  (unknown)
    @           0x405d2d  (unknown)
    @              (nil)  (unknown)
ubuntu@ip-Address:~/caffe$
htoyryla commented 8 years ago

You need to give a delta value too, like I showed earlier.

type: "AdaDelta"
delta: 1e-6

This is what I have tried (and it worked with my data when SGD didn't converge at all).

htoyryla commented 8 years ago
I0801 00:14:02.203501  3971 caffe.cpp:308] Batch 49, accuracy@5 = 0.166667
I0801 00:14:02.203507  3971 caffe.cpp:313] Loss: 0
I0801 00:14:02.203533  3971 caffe.cpp:325] accuracy = 0.103333
I0801 00:14:02.203552  3971 caffe.cpp:325] accuracy@1 = 0.103333
I0801 00:14:02.203564  3971 caffe.cpp:325] accuracy@5 = 0.343333

Probably something is really wrong and this training is not working. If loss is zero already, then there is nothing to supervise the learning further. I would check the dataset: are all labels present with enough examples both in train and val datasets. If not, then the model cannot learn all categories succesfully.

Still, I am not fully sure how to read your printout, as it does not show the difference between training and testing. Are you having both accuracy and loss in the model at the same time? One usually looks at loss when training, and accuracy when testing. Loss should decrease towards zero, accuracy increase towards one, if everything is ok.

I am familiar with lines like these (from an unsuccessful training, the only log I found left). Initially I looked at losses for each iteration, and tested for accuracy at every, say, 1000th iteration. If it trains well, I started printing losses less often.

I0305 11:36:38.108285 26966 solver.cpp:338] Iteration 2000, Testing net (#0)
I0305 11:52:51.347494 26966 solver.cpp:406]     Test net output #0: accuracy = 0

I0530 10:51:16.171003 19113 sgd_solver.cpp:106] Iteration 2718, lr = 1e-05
I0530 10:52:43.504497 19113 solver.cpp:229] Iteration 2720, loss = 2.8002
ProGamerGov commented 8 years ago

Do you know of any scripts/programs I can use to create the labels for data sets? I think their are a few more problematic categories I want to purge, and a few new categories I want to add.

If I have a smaller (around 100 images maybe?) data set of images which are high resolution, would it make sense to chop them into pieces? Or should I keep them whole?

htoyryla commented 8 years ago

Usually the labeling must be done by hand, i.e. one must consider each image separately and decide the correct label.

However, once I created a dataset out of my own photos by using a places-205 model to output labels. In this way I got a file with file paths and labels. There were missing labels in the resulting set (the original model used 205 labels but found only 168 in my photos), so I wrote a script to renumber the labels. But I had to write the scripts myself, both for the labeling using the places model and renumbering the labels, and these scripts are not directly applicable to other cases.

ProGamerGov commented 8 years ago

However, once I created a dataset out of my own photos by using a places-205 model to output labels. In this way I got a file with file paths and labels.

Simon Stålenhag's artwork is mostly of more landscape oriented in nature, so I wonder if that would work well with his artwork?

These are some of the sites/albums I found which had unique artwork from him, do you think a similar approach would work with these?

http://www.simonstalenhag.se/ https://imgur.com/gallery/cGibB https://imgur.com/gallery/ODOi0 https://imgur.com/gallery/VZLDN

His artwork seems like it would better with currently existing pre-trained models that the other data set I have been trying to use.

Is there at least anyway I can streamline the process of manually labeling images?

htoyryla commented 8 years ago

I am not at all sure that places205 model would produce meaningful characterization out of these. One could test what it sees in a picture, however, using a script such as I describe here http://liipetti.net/erratic/2016/03/31/i-have-seen-a-neural-mirage/

The amount of training images could also be a problem. One needs a lot. I had only some 2600 total for the 168 labels, and that is far too few. It would be better to have 2600 per each label.

ProGamerGov commented 8 years ago

The amount of training images could also be a problem. One needs a lot. I had only some 2600 total for the 168 labels, and that is far too few.

If I use his work as a single category, I would have about 300-700 images. If I randomly crop the high res images into pieces, I could stretch it to a larger number. The next trick would be finding a data set (preferably not too large of a data set in terms of file size) I could easily add on a category to. I am unsure of how to go about creating multiple random crops from each image, but listing every image as a single category seems more doable than trying to label them all separately.

htoyryla commented 8 years ago

I guess that if you can find a few hundred images for say, ten different artists, each with clearly different images, then you could train a model to predict which artist's work an image resembles (10 labels). I believe it would work. But I don't know how useful that model would be. The model would learn something about style, but not necessarily enough about objects and features like lines and shapes etc which are essential in neural-style.

ProGamerGov commented 8 years ago

But I don't know how useful that model would be. The model would learn something about style, but not necessarily enough about objects and features like lines and shapes etc which are essential in neural-style.

Because fine tuning exploits what the model already knows, to train it on new content, I can use a model trained on a data set that includes artwork. This also means I can train using less images than I would need for training from scratch. The popular PASCAL VOC data set contains artwork, and I suspect the imagenet data set may contain artwork as well. My previous fine tuning tests were on a model that was fine tuned for picking out prominent objects in real life images, and thus it becomes harder to train on new content using the original non-fine tuned model parts. So using a non-fine tuned model trained on an artwork containing data set should allow me to successfully train it on the desired content.

htoyryla commented 8 years ago

Usually finetuning is done by setting learning rate of conv layers to zero. One assumes that the conv layers already know the necessary features, and one only needs to re-train the FC layers for the different categorization.

It is not at all obvious what happens to the conv layers when finetuning changes them too, as we are doing now. It is possible that they, based on their previous learning, adapt nicely to the features in the new data. Or it may happen that they start changing towards something new in a detrimental manner (as far as neural-style is concerned). I think I have seen the latter case happen a few times.

It is also the case that it takes a lot of iterations for a deep model for all the conv layers to adapt completely. The idea of fast finetuning comes from not changing the conv layers at all.

But yes, I agree with you that it is better if the new data is similar.

htoyryla commented 8 years ago

Another matter... when I wrote

The model would learn something about style, but not necessarily enough about objects and features like lines and shapes etc which are essential in neural-style.

I was thinking of that for neural-style, a model must react to features like lines, shapes and texture. For instance when I trained a model on geometrical shapes of single color each, it produced only single color blobs in neural-style. It didn't not really see the detailed objects and textures which are important. One might have thought that it would produce abstract pictures, which was kind of my goal, but it didn't really, because, failing to see the objects in the content image, it didn't place the colored blobs in a meaningful arrangement.

But of course, when starting from a well trained model which already recognizes objects and textures, one can hope that the further training will not totally mess up the previous capabilities.

ProGamerGov commented 8 years ago

My current experiment seems to actually produce a better result than the original Places 365 Hybrid model:

ProGamerGov commented 8 years ago

Should I manually reshuffle my LMDB files every epoch?


Also, can I have multiple categories for a single image in Caffe if it can fall under multiple categories?

Like this example where all images are part of category one, but are then divided as well into 3 sub categories:

images/image__101.jpg 1 2
images/image__102.jpg 1 1
images/image__103.jpg 1 2
images/image__104.jpg 1 3
images/image__105.jpg 1 1
images/image__106.jpg 1 3
ProGamerGov commented 8 years ago

In this album here: https://imgur.com/a/nxPCC It appears as though the image that has been created with the fine tuned model, creates a "cleaner" image (at least on some parts of the image) than the un-fine tuned model.

I theorize that lightly fine tuning a model on the work of an artist who created your intended style image, can enhance the ability of the model to transfer their style.

ProGamerGov commented 8 years ago

Why is it that the NIN model used by Neural-Style has many usable layers, that are not listed in the train_val.prototxt?

ProGamerGov commented 8 years ago

If anyone is interested I can give you the following so that you do not need to spend hours and hours collecting and preparing the artwork.

simon1.tar.gz 586 images (only colored) | 184 MB simon2.tar.gz 725 images (including uncolored sketches and photos of sketches) | 282 MB

None of the images have been resized or cropped yet. A txt file called "filelist.txt" lists every image's name, so all you need to do is add the category value and the paths for use in Caffe when making your train.txt and val.txt files.


The usefulness of smaller data sets of images can be increased for training by using transformations in the train_val.prototxt, I have discovered. This is in addition to fine tuning which can lower the required amount of images significantly.

at0mb0y commented 8 years ago

Hi, it makes few days I'm following your quest of training your own ConvNet. I'm interested by those training set in order to practice ConvNet Tunning.

best

ProGamerGov commented 8 years ago

@at0mb0y Here are the files:

simon1.tar.gz: https://drive.google.com/file/d/0B--sVcawvPKfSkIyc1ZwX2tOSVE/view?usp=sharing

simon2.tar.gz: https://drive.google.com/file/d/0B--sVcawvPKfTDFWQVFaalhmd3c/view?usp=sharing

If you make a good model with the images, please be sure to post here so I and other can check it out in Neural-Style!

htoyryla commented 8 years ago

_"Why is it that the NIN model used by Neural-Style has many usable layers, that are not listed in the trainval.prototxt?"

Which layers and how did you find them? In principle, the training prototxt is the template according to which the model is originally created, so everything should be there (unless someone has removed layers from the prototxt).

Neural-style loads the model using loadcaffe, which requires the caffemodel and the prototxt as parameters. It is not fully clear to me how loadcaffe would behave if the prototxt would not include all layers. From the source it looks like that it builds the model according to the prototxt so any layers not present in it would not be available to neural-style.

htoyryla commented 8 years ago

As to your hypotheses, my feeling is that supervised learning with labels is not the best way to train convlayers to respond to stylistic features. It may work but there's nothing to guarantee that it will. Training with labels makes the model produce the labels, and everything else is a side effect, to a large extent beyond control.

That's why I am interested in training using autoencoders, generative adversarial networks or something similar. For example, a model in which the training image is processed to a vector and then back to an image. The training is directed so that the resulting image is as close to the original as possible. No labels needed and the model learns directly about the images.

htoyryla commented 8 years ago

I am not aware of any need to manually reshuffle data. That would ruin the idea of really heavy training which can run for days and weeks on its own.

Then you asked about multilabel training. It looks like there are ways to do it: http://stackoverflow.com/questions/32680860/caffe-with-multi-label-images

ProGamerGov commented 8 years ago

That's why I am interested in training using autoencoders, generative adversarial networks or something similar. For example, a model in which the training image is processed to a vector and then back to an image. The training is directed so that the resulting image is as close to the original as possible. No labels needed and the model learns directly about the images.

@htoyryla Would you mind elaborating more on this?


I also discovered this modified script here for using a model to label images in a directory full of images: https://groups.google.com/forum/#!topic/caffe-users/sLgqUgSM3XQ but it does not seem to work properly.

More info on using classify.py: https://groups.google.com/forum/#!searchin/caffe-users/classify|sort:relevance/caffe-users/YSzAIxnDI7w/KKo-0yofEwAJ

I found this example of classifying a single image:

./build/examples/cpp_classification/classification.bin models/own_net/deploy.prototxt examples/RSR_50k_all_1k_db/snapshot_iter_10000.caffemodel examples/RSR_50k_all_1k_db/mean.binaryproto examples/RSR_50k_all_1k_db/labels.txt /home/ubuntu/datasets/RSR_50k_1ll_1k/Testing/[0]/outfile243.jpg

ProGamerGov commented 8 years ago

I ran this:

./build/examples/cpp_classification/classification.bin models/places365/deploy_vgg16_hybrid1365.prototxt models/places365/vgg16_hybrid1365.caffemodel examples/imagenet/s_art_mean.binaryproto models/places365/categories_hybrid1365.txt /home/ubuntu/caffe/data/rocky_beach.jpg 2>&1 | tee ~/mylog.log

And got this output

[libprotobuf WARNING google/protobuf/io/coded_stream.cc:537] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 559415190
---------- Prediction for /home/ubuntu/caffe/data/rocky_beach.jpg ----------
0.7378 - "n09428293 seashore, coast, seacoast, sea-coast 978"
0.0909 - "n09399592 promontory, headland, head, foreland 976"
0.0823 - "n09421951 sandbar, sand bar 977"
0.0480 - "n02894605 breakwater, groin, groyne, mole, bulwark, seawall, jetty 460"
0.0199 - "n04606251 wreck 913"
ProGamerGov commented 8 years ago

After testing, it seems that Places365-Hybrid is ok at identifying the images.

./build/examples/cpp_classification/classification.bin models/places365/deploy_vgg16_hybrid1365.prototxt models/places365/vgg16_hybrid1365.caffemodel examples/imagenet/s_art_mean.binaryproto models/places365/categories_hybrid1365.txt /home/ubuntu/caffe/data/image586.jpg 2>&1 | tee ~/mylog.log

[libprotobuf WARNING google/protobuf/io/coded_stream.cc:537] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 559415190
---------- Prediction for /home/ubuntu/caffe/data/image586.jpg ----------
0.0947 - "n03344393 fireboat 554"
0.0858 - "n04044716 radio telescope, radio reflector 755"
0.0807 - "n04606251 wreck 913"
0.0681 - "n03126707 crane 517"
0.0673 - "n03388043 fountain 562"

Image_586 Does not seem to understand alien world environments very well.

./build/examples/cpp_classification/classification.bin models/places365/deploy_vgg16_hybrid1365.prototxt models/places365/vgg16_hybrid1365.caffemodel examples/imagenet/s_art_mean.binaryproto models/places365/categories_hybrid1365.txt /home/ubuntu/caffe/data/image585.jpg 2>&1 | tee ~/mylog.log

[libprotobuf WARNING google/protobuf/io/coded_stream.cc:537] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 559415190
---------- Prediction for /home/ubuntu/caffe/data/image585.jpg ----------
0.3935 - "n03126707 crane 517"
0.2111 - "n03216828 dock, dockage, docking facility 536"
0.0900 - "n02687172 aircraft carrier, carrier, flattop, attack aircraft carrier 403"
0.0805 - "n03393912 freight car 565"
0.0554 - "n04347754 submarine, pigboat, sub, U-boat 833"

Image_585

./build/examples/cpp_classification/classification.bin models/places365/deploy_vgg16_hybrid1365.prototxt models/places365/vgg16_hybrid1365.caffemodel examples/imagenet/s_art_mean.binaryproto models/places365/categories_hybrid1365.txt /home/ubuntu/caffe/data/image__9.jpg 2>&1 | tee ~/mylog.log

[libprotobuf WARNING google/protobuf/io/coded_stream.cc:537] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 559415190
---------- Prediction for /home/ubuntu/caffe/data/image__9.jpg ----------
0.3298 - "n09428293 seashore, coast, seacoast, sea-coast 978"
0.0906 - "n04606251 wreck 913"
0.0866 - "n09421951 sandbar, sand bar 977"
0.0630 - "n04251144 snorkel 801"
0.0442 - "n10565667 scuba diver 983"

Image__9

./build/examples/cpp_classification/classification.bin models/places365/deploy_vgg16_hybrid1365.prototxt models/places365/vgg16_hybrid1365.caffemodel examples/imagenet/s_art_mean.binaryproto models/places365/categories_hybrid1365.txt /home/ubuntu/caffe/data/image569.jpg 2>&1 | tee ~/mylog.log

[libprotobuf WARNING google/protobuf/io/coded_stream.cc:537] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 559415190
---------- Prediction for /home/ubuntu/caffe/data/image569.jpg ----------
0.1291 - "n03000684 chain saw, chainsaw 491"
0.1138 - "n03803284 muzzle 676"
0.0722 - "n04179913 sewing machine 786"
0.0398 - "n02130308 cheetah, chetah, Acinonyx jubatus 293"
0.0365 - "n03146219 cuirass 524"

Image569

./build/examples/cpp_classification/classification.bin models/places365/deploy_vgg16_hybrid1365.prototxt models/places365/vgg16_hybrid1365.caffemodel examples/imagenet/s_art_mean.binaryproto models/places365/categories_hybrid1365.txt /home/ubuntu/caffe/data/image569.jpg 2>&1 | tee ~/mylog.log

[libprotobuf WARNING google/protobuf/io/coded_stream.cc:537] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 559415190
---------- Prediction for /home/ubuntu/caffe/data/image551.jpg ----------
0.4913 - "n04296562 stage 819"
0.1085 - "n03691459 loudspeaker, speaker, speaker unit, loudspeaker system, speaker system 632"
0.0825 - "n04009552 projector 745"
0.0569 - "n03782006 monitor 664"
0.0331 - "n03180011 desktop computer 527"

Image569

ProGamerGov commented 8 years ago

I tried to make a script that would test all of the images for whether or not it could label them, but it does not work. It can't find the files in the echo'd command.

#!/usr/bin/env bash 
#echo "Script is running!"

num_val=0
            echo $num_val          

for ((n=0;n<5;n++))
do

num_val=$((num_val+1))
            echo $num_val   

        CMDone= 

            "bash ./build/examples/cpp_classification/classification.bin models/places365/deploy_vgg16_hybrid1365.prototxt models/places365/vgg16_hybrid1365.caffemodel examples/imagenet/s_art_mean.binaryproto models/places365/categories_hybrid1365.txt data/s_art/simon1/image"$num_val".jpg"

            #echo $CMDone

            #sleep 10

done

Edit, I ran this variation of the script, with this command:

ubuntu@ip-Address:~/caffe$ bash ./script_3.sh 2>&1 | tee ~/mylog.log

Script

#!/usr/bin/env bash 
#echo "Script is running!"

num_val=0
            #echo $num_val  
          cd caffe             

for ((n=0;n<586;n++))
do

num_val=$((num_val+1))
            #echo $num_val   

        CMDone= 

            "bash ./build/examples/cpp_classification/classification.bin models/places365/deploy_vgg16_hybrid1365.prototxt models/places365/vgg16_hybrid1365.caffemodel examples/imagenet/s_art_mean.binaryproto models/places365/categories_hybrid1365.txt data/s_art/simon1/image"$num_val".jpg"

            #echo $CMDone

            #sleep 100

done

I then used Note++ to remove ./script_3.sh: line 16: bash and : No such file or directory from every line. Then I added 2>&1 | tee ~/mylog.log on the first line and >> ~/mylog.log 2>&1 for everyone one of the other 585 lines.

Then I pasted this into the terminal/cml console: https://gist.github.com/ProGamerGov/f26d8f7adb90c8477b70bf157b1a7a18

Now the trick is to figure how to use the output for labels? Maybe I can append the existing model's weights/content with the art images rather than creating new categories?

ProGamerGov commented 8 years ago
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:537] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 559415190
---------- Prediction for data/s_art/simon1/image327.jpg ----------
0.8065 - "n03947888 pirate, pirate ship 724"
0.0731 - "n04606251 wreck 913"
0.0189 - "n03388043 fountain 562"
0.0101 - "n01704323 triceratops 51"
0.0074 - "n03240683 drilling platform, offshore rig 540"
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:537] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 559415190
---------- Prediction for data/s_art/simon1/image328.jpg ----------
0.0903 - "n01824575 coucal 91"
0.0448 - "n13133613 ear, spike, capitulum 998"
0.0404 - "n12144580 corn 987"
0.0321 - "n01616318 vulture 23"
0.0294 - "n09472597 volcano 980"

That's the output saved in mylog.log. It should be possible to make a script grab the image location and apply the label values. But that might be out of my skill level in this area. It would be interesting to see how the model responds in Neural-Style, even though the labels are not necessarily 100% correct.


Edit:

The full mylog.log file from all the images in the simon1 data set with the Hybrid Places 365 model: https://gist.github.com/ProGamerGov/8d792d6d7fb00167729262931c4089bf

The full mylog.log file from all the images in the simon1 data set with the Regular/Non-Hybrid Places 365 model:

https://gist.github.com/ProGamerGov/5a68492f98e4aa26197ef7bdbdce83a2

ProGamerGov commented 8 years ago

So I guess I need to find something I can modify, or figure out how to make a script which can:

Take the data from a file containing 586 of these:

[libprotobuf WARNING google/protobuf/io/coded_stream.cc:537] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 559415190
---------- Prediction for data/s_art/simon1/image327.jpg ----------
0.8065 - "n03947888 pirate, pirate ship 724"
0.0731 - "n04606251 wreck 913"
0.0189 - "n03388043 fountain 562"
0.0101 - "n01704323 triceratops 51"
0.0074 - "n03240683 drilling platform, offshore rig 540"
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:537] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 559415190
---------- Prediction for data/s_art/simon1/image328.jpg ----------
0.0903 - "n01824575 coucal 91"
0.0448 - "n13133613 ear, spike, capitulum 998"
0.0404 - "n12144580 corn 987"
0.0321 - "n01616318 vulture 23"
0.0294 - "n09472597 volcano 980"

And put it into the structure of:

simon1/image1.jpg 1
simon1/image10.jpg 1
simon1/image100.jpg 1
simon1/image101.jpg 1
simon1/image102.jpg 1
simon1/image103.jpg 1
simon1/image104.jpg 1

Like this, in it's own txt file:

simon1/image327.jpg 724 913 562 51 540
simon1/image328.jpg 91 998 987 23 980

Or I can just take what the network thinks is the most accurate answer:

simon1/image327.jpg 724
simon1/image328.jpg 91

Or we could label the majority, half, or a few of the images by only applying labels that have a high enough accuracy.

htoyryla commented 8 years ago

Just remember that if the train and val sets do not contain examples for every label, it is likely to result in poor training. That's why I had to write a script to renumber the labels.

On the other, as you are not training it for classification, you can as well try to train with an incomplete label set and see what happens.

About multilabel training, it seems possible with caffe but be prepared for problems, the post I linked gave only some ideas and pointers how to do that.

ProGamerGov commented 8 years ago

Just remember that if the train and val sets do not contain examples for every label, it is likely to result in poor training. That's why I had to write a script to renumber the labels.

@htoyryla Do you still have the script, so that I have something to base what I am trying to accomplish off of?

htoyryla commented 8 years ago

"@htoyryla Would you mind elaborating more on this?"

Something like in this http://siavashk.github.io/2016/02/22/autoencoder-imagenet/ . In VGG terms one would remove, say, FC7 and FC8 and instead add a mirrored version of the convlayers to rebuild the image. Then train on images, counting the loss between the output and input images. VGG may, however, be difficult to train in this way. And caffe lacks an Unpooling layer needed, although there are caffe extensions that have one.

Generative adversarial networks are a more sophisticated solution. Two models, one produces an image, the other decides whether the image was real or fake. Both are trained in tandem. See for instance https://swarbrickjones.wordpress.com/2016/01/13/enhancing-images-using-deep-convolutional-generative-adversarial-networks-dcgans/ and https://github.com/soumith/dcgan.torch .

What I find attractive about such approaches is that it is possible to train without labels. Labeling is the main pain in creating datasets. Furthermore, labels are for classification, and style transfer is about images, not classification.

There is much recent work and new applications that use this kind networks to directly work with images. Also looks like the recent work on neural style transfer concentrates on such networks.

htoyryla commented 8 years ago

I'll check if I can find the script.

htoyryla commented 8 years ago

I found those scripts but cannot remember exactly which ones I finally used and how. These are a quick and dirty solution I used to solve a once only task.

This lua script looks like it reads a file containing all valid labels (those that exist in your dataset), each label number on its own line, and then opens val.txt and renumbers the labels from zero to maxlabel-1.

https://gist.github.com/htoyryla/7de83339101524c058da94ba6a176a47

I think the following lua script is what I used for labeling the images using an existing model. It outputs the filenames followed by the label given by the model. It also output the list of all labels that are found, to be given as input to the renumbering script. All this is in the same output stream, separated by a "-------------------" line. Direct all output to a file and then manually copy-paste the relevant areas into a train.txt and valid_labels.txt.

https://gist.github.com/htoyryla/e4fea0efe127b3255ba791f6b4a2b2c6

The renumbering must be done for all images at the same time, and the splitting to train and val sets done later, otherwise the labels will not match. I generated a single all.txt and used the following python script to split it into train and val sets.

https://gist.github.com/htoyryla/fdf83cfd2c511627d02ef21f3d80afb4

ProGamerGov commented 8 years ago

I found this Tensorflow based image classifier here that seems to be extremely easy to setup and use, https://github.com/llSourcell/tensorflow_image_classifier

The Tensorflow model on that Github page has to be trained on the categories that you want it to classify. You can use a browser extension like: https://chrome.google.com/webstore/detail/fatkun-batch-download-ima/nnjjahlikiabnchcpehcpkdeckfgnohf?hl=en, for Chrome, to collect about 100 training images from Google Images for each category.

I am thinking that with a simple script, this easy to use Tensorflow project, could easily be used for labeling images for Caffe training. This is far easier to setup and configure, with image "labeling" for the model being done by placing the images in the appropriate category director that you created, The rest seems pretty much automated.


https://github.com/BVLC/caffe/issues/2051#issuecomment-247765410

In the solver.prototxt, iter_size can be used to compensate for not using the recommended batch size. The default is iter_size: 1.