tensorflow uses 2 times more memory on aarch64

I installed tensorflow-1.13.1-cp35-none-linux_aarch64.whl on aws a1.4xlarge instance (ubuntu16) and on firefly board (RK3399). I downloaded the wheel from https://github.com/lhelontra/tensorflow-on-arm/releases/download/v1.13.1/tensorflow-1.13.1-cp35-none-linux_aarch64.whl

I tried to run resnet50 model (1,224,224,3) memory usage was 1.6-1.7 GB

I also tried to run the same resnet50 model with official TF wheel for Raspberry pi or linux_x86. Memory usage was only 620-680MB

Raspberry pi wheel https://www.piwheels.org/simple/tensorflow/tensorflow-1.13.1-cp35-none-linux_armv7l.whl

What I noticed it that Raspberry pi official wheel uses 650MB after I loaded the model and it does nto use any extra memory to run the model.

But lhelontra wheel uses 700 MB after I loaded the model and it also uses another 1GB after first run of the model. After the second and consequent runs the memory usage stays the same - 1.8GB

Raspberry pi official wheel

(224, 224, 3) panda.jpg
peak memory usage (bytes on OS X, kilobytes on Linux) 128920
Loading frozen model: resnet50_frozen.pb ....
WARNING:tensorflow:From ./run-tf.py:85: FastGFile.__init__ (from tensorflow.python.platform.gfile) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.gfile.GFile.
<class 'tensorflow.core.framework.graph_pb2.GraphDef'>
peak memory usage (bytes on OS X, kilobytes on Linux) 642516
peak memory usage (bytes on OS X, kilobytes on Linux) 642516
input_tensor_names: ['aimport/input_1:0']
output_tensor_names: {'aimport/fc1000/Softmax:0'}
Tensor("aimport/input_1:0", shape=(?, 224, 224, 3), dtype=float32)
input_shape: (?, 224, 224, 3)
Tensor("aimport/fc1000/Softmax:0", shape=(?, 1000), dtype=float32)
sess.run...
sess.run done
duration 6,654 ms
peak memory usage (bytes on OS X, kilobytes on Linux) 642516
sess.run...
sess.run done
duration 1,117 ms
peak memory usage (bytes on OS X, kilobytes on Linux) 642516
sess.run...
sess.run done
duration 1,103 ms
peak memory usage (bytes on OS X, kilobytes on Linux) 642516
1
(1, 1000)
panda.jpg - 388, 0.9995660185813904, giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca
peak memory usage (bytes on OS X, kilobytes on Linux) 642516

lhelontra wheel

(224, 224, 3) panda.jpg
peak memory usage (bytes on OS X, kilobytes on Linux) 179216
Loading frozen model: resnet50_frozen.pb ....
WARNING:tensorflow:From ./run-tf.py:85: FastGFile.__init__ (from tensorflow.python.platform.gfile) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.gfile.GFile.
<class 'tensorflow.core.framework.graph_pb2.GraphDef'>
peak memory usage (bytes on OS X, kilobytes on Linux) 727996
input_tensor_names: ['aimport/input_1:0']
output_tensor_names: {'aimport/fc1000/Softmax:0'}
Tensor("aimport/input_1:0", shape=(?, 224, 224, 3), dtype=float32)
input_shape: (?, 224, 224, 3)
Tensor("aimport/fc1000/Softmax:0", shape=(?, 1000), dtype=float32)
peak memory usage (bytes on OS X, kilobytes on Linux) 727996
sess.run...
sess.run done
duration 4,225 ms
peak memory usage (bytes on OS X, kilobytes on Linux) 1814044
sess.run...
sess.run done
duration 135 ms
peak memory usage (bytes on OS X, kilobytes on Linux) 1814044
sess.run...
sess.run done
duration 139 ms
peak memory usage (bytes on OS X, kilobytes on Linux) 1814044
1
(1, 1000)
panda.jpg - 388, 0.9995660185813904, giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca
peak memory usage (bytes on OS X, kilobytes on Linux) 1814044

lhelontra / tensorflow-on-arm

tensorflow uses 2 times more memory on aarch64 #57