samjabrahams / tensorflow-on-raspberry-pi

TensorFlow for Raspberry Pi
Other
2.25k stars 496 forks source link

Accuracy issue #34

Open Techblogogy opened 8 years ago

Techblogogy commented 8 years ago

Describe the Issue

I've retrained tensorflow inception classifier, using example code from official TensorFlow repository on my 64 bit desktop machine with over 80% accuracy. But when I copied over .pb and .txt files to PI I get around 30% accuracy. I've found a similar issue and that person suggested to retrain classifier on 32 bit raspberry PI, but after doing that I still get 30% accuracy.

Is there any way to fix that? Thanks in advance

Hardware/Software Info

Please provide the following information about your Raspberry Pi setup:

samjabrahams commented 8 years ago

Interesting- I think there was a similar problem reported in #30, but it was closed before it was investigated further.

@zxzhijia - did you manage to fix the accuracy issues, or did you just not bother with the model anymore?

Techblogogy commented 8 years ago

No, I've been going at it on my spare time, but to no progress :(

AlexanderLitz commented 8 years ago

I experienced this issue w/ the released RPI wheel, but when I compiled from source, this issue went away, ie, I got identical results.

dominikandreas commented 7 years ago

I have a similar issue. When I run a tensorflow model on my Pi model 3, I only get NANs in the output vector while the same model works fine on other (non ARM) platforms.

I installed tensorflow version 0.12 using the wheel for python 2.7. I'll try to compile it from source and see if that alleviates the issue

samjabrahams commented 7 years ago

Are you using a pretrained model? If so, I can try to test it out to see if I can replicate your issue.

dominikandreas commented 7 years ago

I tried to compile it from source but bazel killed the pi after 3 hours (my pi restarted not sure what happened). This is my exported graph which I'm using: https://1drv.ms/u/s!AjBMlWMdSnfSg7xkXuRAhgMgRNzXRQ

Its pretrained and includes the weights, exported using freeze_graph.py

samjabrahams commented 7 years ago

@dominikandreas I was able to replicate the NaN issue on my RPi. Still working through the issue, but I have a few questions in the meantime:

  1. It looks like some of the saved freezed Variable values are being saved as NaN. For example, running this code returns a bunch of NaN values in the very first constant Tensor used in the graph (run from my desktop rig, not Raspberry Pi):
import tensorflow as tf
# Load GraphDef
gd = tf.GraphDef()
with open('gaph.pb', 'rb') as f:
    gd.ParseFromString(f.read())
# Import GraphDef to Graph
graph = tf.Graph()
with graph.as_default():
    tf.import_graph_def(gd, name='')
# Run Constant Op
sess = tf.Session(graph=graph)
const = graph.get_tensor_by_name('network/conv1_7x7/weights/read/_33__cf__33:0')
print(sess.run([const]))

Could you try running that snippet and seeing if you get NaN values?

  1. Interestingly enough, my desktop is till able to get real-valued outputs from the graph despite seeing a bunch of NaNs; not sure what to make of it.

Will try to get another look at it soon.

dominikandreas commented 7 years ago

@samjabrahams good idea checking the weights, not sure why I didn't think of that. I think my model must be overfitting using very large weights. I could imagine that, with those large weights, small differences in computation lead to completely different outcomes.

Interesting to note: I used tfdeploy to export my model to python+numpy and also get NANs in the output of the resulting model. I will retrain my model using weight regularization and see if that still results in a notable difference.

CocebanVlad commented 5 years ago

@dominikandreas any news on that?

dominikandreas commented 5 years ago

I think the problem was having NaNs in the weights, but it's so long ago I'm not sure whether I got it running in the end or not

On Wed, 26 Jun 2019, 22:28 Coceban Vlad notifications@github.com wrote:

@dominikandreas https://github.com/dominikandreas any news on that?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/samjabrahams/tensorflow-on-raspberry-pi/issues/34?email_source=notifications&email_token=ADHGAMBRLVWAWB4JCNF4RL3P4PGOVA5CNFSM4CLELLY2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYUXBYA#issuecomment-506032352, or mute the thread https://github.com/notifications/unsubscribe-auth/ADHGAMGZJYUDWOAGZYS7BLDP4PGOVANCNFSM4CLELLYQ .