Open sjaiswal25 opened 6 years ago
I had a similar issue: https://github.com/szaza/android-yolo-v2/issues/5. The issue was because of the different version of the TensorFlow, so for training other version has been used then for running the model. Could you please check the TensorFlow versions?
Sorry, for the delay in reply. But, I only have tensorflow 1.6.0 installed on my system. Also, I am not using tensorflow for training. I have directly used the .weights files created while training (using alexeyAB_darknet project files) to create the .pb file.
I got it solved. I was using the tensorflow serving api for loading the model. Had to change it.
I wanted to modify your code for performing object detection on video rather than an image using javacv, but every time a frame is passed a new session and graph is built. To overcome that, I defined the graph and session once, but then it throws and error stating that the graph has a duplicate name : input? Could you suggest something?
To define a graph and session once is a very good idea and I think it is the only way to perform object detection on a video stream. The duplicate name: 'input'... usually happens when the graph is already loaded. How do you know that a new session and graph is built at each frames? Could you please share with me the source code or at least a part of it to check the graph loading and the session creation?
2018-09-11 11:49:37.826567: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1312] Adding visible gpu devices: 0 2018-09-11 11:49:37.826672: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 156 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1) 2018-09-11 11:49:38.181700: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1312] Adding visible gpu devices: 0 2018-09-11 11:49:39.356879: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 152 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
This is the output of the code without modifying the ObjectDetector class. Since everytime tensorflow device is being created, I inferred that a new session and graph is being created for every frame.
2018-09-11 11:59:26.105451: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1312] Adding visible gpu devices: 0 2018-09-11 11:59:26.105572: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 164 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1) 2018-09-11 11:59:27.684382: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.86GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. Object: {} - confidence: {}person0.67395884 Object: {} - confidence: {}person0.55275875 Sep 11, 2018 11:59:27 AM recorder.common.Executor lambda$start$0 SEVERE: Executor crashed. java.lang.IllegalArgumentException: Duplicate node name in graph: 'input'
This is the output when I define the session and graph file only once.
Methods in the key detect
method of this repository that create a Graph and start a Session are: normalizeImage(byte[] image)
and executeYoloGraph(Tensor<Float> input)
. From my console logs, normalizeImage
finishes quick like the rest of them which led me to focus on the latter. In executeYoloGraph
, I made the Graph
and Session
variables global/instance variables so they load only once which improves the processing speed to about 2 fps from about 12 seconds per frame previously. A significant speed-up, but still not reasonable sadly.
I've inferred this retrieval of output from the session is the bottleneck:
Tensor<Float> result = yoloSession.runner().feed("input", image).fetch("output").run().get(0).expect(Float.class);
where yoloSession
is my global session. These are already native methods of Session so are we driven to a corner?
Your yolo-android @szaza runs in real-time and the only difference I'm seeing is the use of the TensorflowInferenceInterface
in graph execution. Is this the game changer to our concern?
@sjaiswal25 were you able to obtain reasonable speed from your optimizations?
Hi @dreistheman,
Are you running your model on a GPU or on CPU? Which model do you use (yoloV2, tiny-yolo)?
Hello @szaza, I trained my model on yolov2 and I'm not fully sure if my GPU (GTX 1070) is helping with the inference although I've installed CUDA 10 and cudnn 7, and I included the gputensorflow_jni.dll
along with Tensorflow-1.13.1.jar
(manual dependency inclusions referenced from https://www.tensorflow.org/install/lang_java#tensorflow_with_the_jdk, will migrate soon to Maven) in my classpath on Netbeans. (update: Yes it's running with the GPU.)
I've now integrated TensorflowInferenceInterface
for the graph execution, but I'm receiving a NoClassDefFoundError:android/util/Log
even after adding a library org.komamitsu.android.util.Log
which is Android log for non-Android environments (https://jar-download.com/maven-repository-class-search.php?search_box=android.util.Log).
The Jar with org.tensorflow.contrib.android.TensorFlowInferenceInterface
is from bytedeco tensorflow (https://jar-download.com/artifacts/org.bytedeco.javacpp-presets/tensorflow/1.9.0-1.4.2/source-code)
How fast does it run on yours? Have you tried integrating TensorflowInferenceInterface
to Java desktop or any other approach for normal speed video inference (about 20-30 fps at least) with Tensorflow Java?
I created a saved_model.pb file using darkflow and I am trying to load and run the same using your code. I get the following error: Failed to read [/YOLO/saved_model.pb]!
What could be possibly wrong?