samjabrahams / tensorflow-on-raspberry-pi

TensorFlow for Raspberry Pi
Other
2.25k stars 495 forks source link

Inception-v3 works, AlexNet causes OOM #14

Open ArnoXf opened 8 years ago

ArnoXf commented 8 years ago

Hi! First of all, many thanks for your work! Installation with pip on Pi 3 with Jessie worked without any issues. I first tried the Inception-v3 classification you provided and it worked very well. Now I am trying to get AlexNet working on the Oxford 17 flowers dataset. I have the following configuration:

input_layer = input_data(shape=[None, 224, 224, 3])

conv1 = conv_2d(input_layer, 96, 11, strides=4, activation='relu')
pool1 = max_pool_2d(conv1, 3, strides=2)
network = local_response_normalization(pool1)

conv2 = conv_2d(network, 256, 5, activation='relu')
pool2 = max_pool_2d(conv2, 3, strides=2)
network = local_response_normalization(pool2)

conv3 = conv_2d(network, 384, 3, activation='relu')
conv4 = conv_2d(conv3, 384, 3, activation='relu')
conv5 = conv_2d(conv4, 256, 3, activation='relu')
pool3 = max_pool_2d(conv5, 3, strides=2)
network = local_response_normalization(pool3)

fc1 = fully_connected(network, 4096, activation='tanh')
dropout1 = dropout(fc1, 0.5)
fc2 = fully_connected(dropout1, 4096, activation='tanh')
dropout2 = dropout(fc2, 0.5)
fc3 = fully_connected(dropout2, 2, activation='softmax')
network = regression(fc3, optimizer='momentum', loss='categorical_crossentropy',
                     learning_rate=0.01)

This was written using the TFLearn API, but I think it gives a good overview over the layers and configuration. This code is working on my desktop computer but fails with OOM error in the fully connected layers on the Pi. Reducing the fc layers doesn't give OOM error with 1024 instead of 4096. Unfortunately it is still not training but just quitting after building up the network.

Any ideas how to solve this? Isn't the loaded Inception graph bigger than AlexNet?

samjabrahams commented 8 years ago

Thanks for posting this! I'll try to take a look at these memory issues sometime over the weekend. I believe that the issue is that you're actually training the AlexNet model on the Raspberry Pi, whereas the Inception model is pretrained.

When you train a model, the machine has to store values of each node along the way in order to compute the gradient, which means training takes a much larger amount of memory.

samjabrahams commented 8 years ago

If the goal is to use the Raspberry Pi to classify the flowers, I would suggest training the model on your desktop computer, saving/exporting it, and then loading that trained model onto the RPi. Check out the official how-to here.

Maybe I'll make some sort of baby TensorFlow Serving server for running pre-trained models on RPi at some point.

ArnoXf commented 8 years ago

Oh no, actually I already trained the model on my desktop machine using a GTX 750 ti. I saved my model there using tflearns model.save("my_model"). Then transfered the saved weights file to the Pi, build the net architecture there (like desicribed in my first post) and load the weights using model.load("my_model"). I don't want to train on the Pi, but just load the model and predict single images (what works on my desktop machine).

samjabrahams commented 8 years ago

Great- glad to hear you're already doing that! Next question: when you use model.save() and model.load(), are you including the last line of your code?

network = regression(fc3, optimizer='momentum', loss='categorical_crossentropy',
                     learning_rate=0.01)

Even if you pre-trained your weights, that line is going to cause your model to continue training when you run it on your Pi and not just feed values forward.

Apologies if you've already tried the things I'm suggesting: I don't know what you've previously attempted, so I'm trying to get a better understanding of where we stand.

mrubashkin-svds commented 8 years ago

Hey Sam, Thanks for answering @ArnoXf 's questions. Have you been able to successfully train any part or whole model on the Pi3? If not, do you know of any other places where things like "learning_rate" need to be turned off to avoid building model errors?

samjabrahams commented 8 years ago

Hi @mrubashkin-svds - are you referring to AlexNet, or any model in general? I've done toy training on the RPi to make sure that the TensorFlow binaries work properly, but I haven't done any significant training on large models with the Raspberry Pi.

I don't have much experience with TFLearn, so I'm not sure how it runs Sessions, but the main thing is to not pass in any sort of Optimizer Operations in Session.run(), otherwise it'll have to store a huge amount of data in memory.

mrubashkin-svds commented 8 years ago

Hey @samjabrahams thank you for the input! I've been working with Inception V3 specifically (no luck with AlexNet) over the past few days and while unable to build any model on the pi, I was able to persist a 85mb model in memory and evaluate single images against it in ~real-time (7 sec processing time).

One more question if you have the time, do you have any suggestions for speeding up the processing time? The time seems to be independent of the picture size (i.e. same amount of time for a 24x24 or 240x240 image). Thanks again @samjabrahams !!

samjabrahams commented 8 years ago

No problem! I believe that the Inception model resizes images automatically, which is why tiny images have the same compute time as huge ones. Getting the model to run faster is something that a fair number of people are currently working on. Here's a short list of things that may be causing the slowdown on the RPi compared to other computers running Inception on CPU:

I'm probably forgetting several important factors, but that's something to start from. Here are some ways that one might try to alleviate these issues:

danbri commented 7 years ago

@samjabrahams on that last point, did you have any suggests with quantization?