tensorflow / models

Models and examples built with TensorFlow
Other
77.16k stars 45.75k forks source link

[deeplab] Running mobilenetV2 on Jetson Tx2 #3882

Closed gustavz closed 4 years ago

gustavz commented 6 years ago

Did someone manage to run deeplab's mobilenetV2 on the jetson tx2?

gustavz commented 6 years ago

i did it myself : https://github.com/GustavZ/realtime_segmenation but somehow the segmentation result is all messed up / total trash although the code works on other machines.

This is what i get when i run the demo on the jetson tx2: (i am using tensorflow 1.7 build from source) screenshot from 2018-04-05 14-22-17

EDIT: i now tried it with TF 1.5, 1.6, and 1.7 and always the same messy results

asimshankar commented 6 years ago

Could you try to track this down a bit more - are the model inputs and outputs the same on the TX2 as they are on the other machines?

Basically, I'm trying to fish for whether the TensorFlow model execution is running differently or if the PIL.Image or visualization libraries on your Jetson setup are doing something different.

gustavz commented 6 years ago

It does not depend on PIL. I get the same results when I capture video frames and display them with openCV.

My host pc and my jetson got the exact same setup of packages.

asimshankar commented 6 years ago

It did seem that you're using PIL.Image to create the input data. Anyway, that doesn't quite matter much - as a step towards debugging this I was suggesting that you verify that that the contents of the input and output numpy arrays match in both platforms.

i.e., np.asarray(resized_image) and batch_seg_map[0] in https://github.com/GustavZ/realtime_segmenation/blob/master/demo.py#L66

Are the contents of those arrays (given the same input) identical across both platforms?

gustavz commented 6 years ago

@asimshankar they should be the same as PIL and openCV give the correct result if fed into other models (object detection for example) So i am pretty sure the problem is not about the input (PIL,OpenCV) but must lay somewhere else.

Also, It is not about the color formatting (itried RBG and BGR) the results are the same.

Is there someone who ran deeplab models successfully on a nvidia jetson platform?

gustavz commented 6 years ago

@asimshankar could you please tell me in detail which information/output you want to have? The exact code to reproduce is available on my repo. demo.py is deeplabs's orginal jupyter notebook demo and run.py is for realtime segmentation using openCV.

asimshankar commented 6 years ago

I don't have a Jetson, so I can't run this myself. In absence of that information, my suggestion was to attempt to narrow this down by first confirming whether the issue is with TensorFlow execution or other parts of the demo script.

One way to do this would be to see what the session.run call returns on both platforms for the exact same input. In my previous comment, I was suggesting that you log the input tensor (np.array(resized_image)) created on the Jetson and then compare it with the value of that tensor generated from the same input image on the other machine.

If those two are the same (the same input value), then log batch_seg_map[0] returned from session.run() to confirm that the output is the same.

If that is not the case, then we have to trace down why the TensorFlow model isn't producing the same output for the same input on different platforms. At that point, perhaps you could dig into the values of intermediate tensors using the TensorFlow debugger and it's GUI.

gustavz commented 6 years ago

print np.asarray(resized_image) gives me on both plattforms:

[[[ 12  12  36]
  [  8   8  32]
  [  7   7  31]
  ...
  [ 48  64  54]
  [ 50  66  54]
  [ 52  68  55]]

 [[ 12  12  36]
  [ 10  10  34]
  [ 15  16  40]
  ...
  [ 62  80  68]
  [ 60  77  63]
  [ 61  78  65]]

 [[ 12  12  36]
  [ 18  18  42]
  [ 38  39  63]
  ...
  [ 60  78  64]
  [ 56  74  60]
  [ 58  74  62]]

 ...

 [[ 23  21  45]
  [ 26  24  48]
  [ 23  21  45]
  ...
  [ 97  99 102]
  [114 116 120]
  [138 141 145]]

 [[ 19  17  41]
  [ 24  22  46]
  [ 25  23  47]
  ...
  [ 96 100  99]
  [ 98 103 107]
  [ 96 100 111]]

 [[ 21  19  43]
  [ 19  17  41]
  [ 17  15  39]
  ...
  [112 116 116]
  [ 82  85  90]
  [ 59  62  73]]]

but print batch_seg_map[0] of course is different as the resulting visualization makes obvious.

So as i said, the problem appears inside the session and not in the pre- or postprocessing.

Working platform:

[[0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]]

Jetson:

[[15  0  0 ...  0  0 15]
 [ 0  0  0 ...  0  0 15]
 [ 0  0  0 ...  0  0  0]
 ...
 [ 5  0  0 ...  0  0  0]
 [ 5  0  0 ...  0  0  0]
 [ 5  0  0 ...  0  0  0]]

@asimshankar what is strange that the print command does not print the whole array visualizes them simplified with .... Can you tell me why that is? Normally it is possible to print every cell of every array.

gustavz commented 6 years ago

@asimshankar so it is a problem with the tensorflow version that is installed. Do you know how tensorflow must be build form source to be able to run deeplab models? i am using prebuild tf wheels from this repo: https://github.com/peterlee0127/tensorflow-nvJetson

Maybe those builds have not the right config, how would you recommend to build tf?

asimshankar commented 6 years ago

To build TensorFlow from source see https://www.tensorflow.org/install/install_sources

Unfortunately, we don't yet have the bandwidth to support builds on Jetson and thus rely on community support for that. It seems there are other TF-on-Jetson enthusiasts, perhaps you could seek some assistance on the NVIDIA forums as well? For example: https://devtalk.nvidia.com/default/topic/1028840/jetson-tx2/available-tensorflow-1-5-for-jetson-tx2/

gustavz commented 6 years ago

Thanks but I know how to build tf from source and also I am familiar with this forum as I am contributing on it. But as far as I know there is no documented usage of deeplab on jetson tx2, or at least I could not find it.

I just thought it could be a known problem or it could be interesting that deeplab does not run on jetson, but if it’s not it’s ok as well.

gustavz commented 6 years ago

@asimshankar @tfboyd Did you hear of anyone successfully use deeplab on a jetson or any other ARM based architecture?

timoonboru commented 6 years ago

I use the mobilenet and xception 65 models with *.cpkt which prepared by official. And both of this get the messy results.While in my PC(X86 Ubuntu), the result of deeplab is correct. Did you find the reason? Thank U! @GustavZ

Dammi87 commented 6 years ago

I have the exact same issue, although not with Deeplab, but PSPNet. I downloaded this repo , created a saved_model from the checkpoint.

On my computer, the segmentation works fine, while the same image on jetsson is corrupted in a similar manner as you have described.

timoonboru commented 6 years ago

Did you find the solution with the messy problem? Help help ! @Dammi87 @GustavZ

hansanjie commented 5 years ago

@GustavZ I meet the same problem when i run the retrained mobilenet_v2 model, I need your help!

timoonboru commented 5 years ago

@GustavZ I successfuly run deeplab on Xavier, while failed in TX2.

mchhoy commented 5 years ago

It may be due to an issue running depthwise dilated convolutions on the tx2:

https://devtalk.nvidia.com/default/topic/1044411/jetson-tx2/tensorflow-op-spacetobatchnd-does-not-work-correctly-on-tx2/

A workaround may be to replace these with ordinary depthwise convolutions, not sure of computational cost

tensorflowbutler commented 4 years ago

Hi There, We are checking to see if you still need help on this, as this seems to be considerably old issue. Please update this issue with the latest information, code snippet to reproduce your issue and error you are seeing. If we don't hear from you in the next 7 days, this issue will be closed automatically. If you don't need help on this issue any more, please consider closing this.