Object Detection on Android is abnormally slow

System information

What is the top-level directory of the model you are using:
models\research\object_detection.
Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
A little. Mostly edited existing files to match my needs.
OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
Windows 10 version 1709.
TensorFlow installed from:
Source.
TensorFlow version:
1.6.0.
Bazel version (if compiling from source):
n/a.
CUDA/cuDNN version:
CUDA v9.0 / cuDNN v64_7.
GPU model and memory:
Nvidia GeForce GTX 1050 4GB 6.1.
Exact command to reproduce:
n/a.

Describe the problem

After following these great tutorials, I was able to train my own Object Detection models. For the training, I've used around 1600 images (1300 train + 300 test) in 6 categories and trained the models until the loss stabilized below the 0.05 mark on TensorBoard.

The first model I used was the faster_rcnn_inception_v2_coco model, and it works like a charm on my computer, but not so great on my phone: after exporting it to the TF Detect demo app, I`ve noticed that it was only running the object detection every 30~60 seconds, or more. I would show imy phone an object, then only after a minute its boundaries were drawn.

Things I tried:

Tried training with "faster models": the ssd_mobilenet_v2_coco and ssd_inception_v2_coco models. They performed worse on the detection, but both also took the same time to process the images.
Tried lowering the ammount of predictions made (from 300 to 30), both on the training .config files and the APIModel.java file, but no such luck.
Tried with both frozen and "unfrozen" models (obtained with the freeze_graph.py script), but still no difference.
Tried testing on multiple phones besides my Moto X Play: tried on a Samsung Galaxy S8, on a Samsung Galaxy A7 and on a Amazon Fire 7. Neither of them did better than my phone.

I don't need "real-time butter smooth 144 detections per second" performance, but it would be great if I could get something faster than a detection every minute. Is there another step for optimizing object detection models for mobile that I couldn't find anywhere else? Or is something wrong with the demo and I should make my own detection app from scratch?

Source code / logs

Android Studio debugger output (Amazon Fire 7 + ssd_inception_v2_coco)


I/tensorflow: CameraConnectionFragment: Valid preview sizes: [480x480, 640x480, 720x480, 720x720, 800x480, 800x600, 800x800, 864x480, 960x540, 1280x720, 1600x912, 1600x1200]
              CameraConnectionFragment: Rejected preview sizes: [176x144, 320x240, 352x288, 480x320, 480x368]
              CameraConnectionFragment: Exact size match found.
I/art: Background partial concurrent mark sweep GC freed 4116(240KB) AllocSpace objects, 2(84MB) LOS objects, 36% free, 7MB/11MB, paused 935us total 127.001ms
W/tensorflow: TensorFlowObjectDetectionAPIModel: ???
W/tensorflow: TensorFlowObjectDetectionAPIModel: label1
W/tensorflow: TensorFlowObjectDetectionAPIModel: label2
              TensorFlowObjectDetectionAPIModel: label3
              TensorFlowObjectDetectionAPIModel: label4
              TensorFlowObjectDetectionAPIModel: label5
W/tensorflow: TensorFlowObjectDetectionAPIModel: label6
I/TensorFlowInferenceInterface: Checking to see if TensorFlow native methods are already loaded
I/TensorFlowInferenceInterface: TensorFlow native methods already loaded
I/TensorFlowInferenceInterface: Model load took 1438ms, TensorFlow version: 1.8.0-rc1
I/TensorFlowInferenceInterface: Successfully loaded model from 'file:///android_asset/ssd_inference_graph.pb'
I/tensorflow: DetectorActivity: Camera orientation relative to screen canvas: 90
              DetectorActivity: Initializing at size 640x480
E/tensorflow: ObjectTracker: libtensorflow_demo.so not found, tracking unavailable
I/tensorflow: MultiBoxTracker: Initializing ObjectTracker: 640x480
E/tensorflow: ObjectTracker: Native object tracking support not found. See tensorflow/examples/android/README.md for details.
W/ResourceType: Attempt to retrieve bag 0x0103003e which is invalid or in a cycle.
E/tensorflow: MultiBoxTracker: Object tracking support not found. See tensorflow/examples/android/README.md for details.
I/tensorflow: DetectorActivity: Preparing image 1 for detection in bg thread.
I/tensorflow: DetectorActivity: Running detection on image 1
I/MaliEGL: [Mali]window_type=1, is_framebuffer=0, errnum = 0
           [Mali]surface->num_buffers=4, surface->num_frames=3, win_min_undequeued=1
           [Mali]max_allowed_dequeued_buffers=3
I/Kernel: [ 5675.354224].(3)[19526:tensorflow.demo]lowmemorykiller: Killing '.amazon.venezia' (19801), adj 12, score_adj 705,
          [ 5675.354224]   to free 51316kB on behalf of 'tensorflow.demo' (19526) because
          [ 5675.354224]   cache 146736kB is below limit 147456kB for oom_score_adj 529
          [ 5675.354224]   Free memory is 0kB above reserved
I/Kernel: [ 5675.472659].(3)[19527:tensorflow.demo]lowmemorykiller: Killing 'azon.kindle.cms' (18630), adj 12, score_adj 705,
          [ 5675.472659]   to free 44224kB on behalf of 'tensorflow.demo' (19527) because
          [ 5675.472659]   cache 145512kB is below limit 147456kB for oom_score_adj 529
          [ 5675.472659]   Free memory is 0kB above reserved
I/Kernel: [ 5675.653276].(2)[19529:tensorflow.demo]lowmemorykiller: Killing 'om.amazon.tahoe' (19754), adj 12, score_adj 705,
I/Kernel: [ 5675.653276]   cache 138356kB is below limit 147456kB for oom_score_adj 529
          [ 5675.653276]   Free memory is 0kB above reserved
I/Kernel: [ 5675.751049].(0)[19526:tensorflow.demo]lowmemorykiller: Killing 'evice.messaging' (19734), adj 12, score_adj 705,
          [ 5675.751049]   to free 34896kB on behalf of 'tensorflow.demo' (19526) because
          [ 5675.751049]   cache 137348kB is below limit 147456kB for oom_score_adj 529
          [ 5675.751049]   Free memory is 0kB above reserved
I/Kernel: [ 5675.777268].(3)[19527:tensorflow.demo]lowmemorykiller: Killing 'on.sync.service' (19448), adj 12, score_adj 705,
          [ 5675.777268]   to free 29884kB on behalf of 'tensorflow.demo' (19527) because
          [ 5675.777268]   cache 137012kB is below limit 147456kB for oom_score_adj 529
          [ 5675.777268]   Free memory is 0kB above reserved
I/Kernel: [ 5675.895443].(2)[19528:tensorflow.demo]lowmemorykiller: Killing 'h2clientservice' (19482), adj 12, score_adj 705,
          [ 5675.895443]   to free 33808kB on behalf of 'tensorflow.demo' (19528) because
          [ 5675.895443]   cache 135308kB is below limit 147456kB for oom_score_adj 529
          [ 5675.895443]   Free memory is 0kB above reserved
I/Kernel: [ 5675.976866].(1)[19528:tensorflow.demo]lowmemorykiller: Killing 'VMetricsProcess' (18717), adj 12, score_adj 705,
          [ 5675.976866]   to free 21516kB on behalf of 'tensorflow.demo' (19528) because
          [ 5675.976866]   cache 132396kB is below limit 147456kB for oom_score_adj 529
          [ 5675.976866]   Free memory is 0kB above reserved
I/art: Explicit concurrent mark sweep GC freed 1683(130KB) AllocSpace objects, 1(49MB) LOS objects, 39% free, 5MB/8MB, paused 793us total 108.477ms
I/tensorflow: MultiBoxTracker: Processing 0 results from 1
I/tensorflow: DetectorActivity: Preparing image 418 for detection in bg thread.
I/tensorflow: DetectorActivity: Running detection on image 418
I/Choreographer: Skipped 42 frames!  The application may be doing too much work on its main thread.
I/tensorflow: MultiBoxTracker: Processing 1 results from 418
I/tensorflow: DetectorActivity: Preparing image 751 for detection in bg thread.
I/tensorflow: DetectorActivity: Running detection on image 751
I/Choreographer: Skipped 42 frames!  The application may be doing too much work on its main thread.
I/tensorflow: MultiBoxTracker: Processing 0 results from 751
I/tensorflow: DetectorActivity: Preparing image 1096 for detection in bg thread.
I/tensorflow: DetectorActivity: Running detection on image 1096
I/Choreographer: Skipped 47 frames!  The application may be doing too much work on its main thread.
I/Kernel: [ 5804.275921].(0)[19526:tensorflow.demo][STP-PSM] [I]_stp_psm_stp_is_idle: **IDLE is over 5000 msec, go to sleep!!!**
I/tensorflow: MultiBoxTracker: Processing 0 results from 1096
I/tensorflow: DetectorActivity: Preparing image 1386 for detection in bg thread.
I/tensorflow: DetectorActivity: Running detection on image 1386
I/tensorflow: MultiBoxTracker: Processing 0 results from 1386
I/tensorflow: DetectorActivity: Preparing image 2029 for detection in bg thread.
I/tensorflow: DetectorActivity: Running detection on image 2029
I/art: Explicit concurrent mark sweep GC freed 34323(1262KB) AllocSpace objects, 0(0B) LOS objects, 40% free, 5MB/8MB, paused 907us total 68.107ms
I/tensorflow: MultiBoxTracker: Processing 0 results from 2029
I/tensorflow: DetectorActivity: Preparing image 2351 for detection in bg thread.
I/tensorflow: DetectorActivity: Running detection on image 2351
I/tensorflow: MultiBoxTracker: Processing 1 results from 2351
I/tensorflow: DetectorActivity: Preparing image 2673 for detection in bg thread.
I/tensorflow: DetectorActivity: Running detection on image 2673
I/tensorflow: MultiBoxTracker: Processing 2 results from 2673
I/tensorflow: DetectorActivity: Preparing image 2996 for detection in bg thread.
I/tensorflow: DetectorActivity: Running detection on image 2996
I/art: Explicit concurrent mark sweep GC freed 24655(897KB) AllocSpace objects, 0(0B) LOS objects, 39% free, 5MB/8MB, paused 793us total 61.978ms

[... it keeps going on like that ...]```

_(Yeah the poor tablet keeps running out of memory, but even the mighty Galaxy S8 with double the ammount of RAM also took the same time to detect the objects.)_

tensorflow / models