merrymercy / tvm-mali

Optimizing Mobile Deep Learning on ARM GPU with TVM
http://tvmlang.org/2018/01/16/opt-mali-gpu.html
MIT License
179 stars 28 forks source link

about get_workload #3

Closed janboeye closed 6 years ago

janboeye commented 6 years ago

hi, @merrymercy

will get_workload load pretrained weights? Where could I find fp16 mobilenet pretrained weight?

Thanks

merrymercy commented 6 years ago

get_workload does not load pretrained weight. We can get pretrained weights from gluon model zoo following this tutorial http://nnvm.tvmlang.org/tutorials/deploy_model_on_mali_gpu.html#sphx-glr-tutorials-deploy-model-on-mali-gpu-py

The tutorial does not mention the type conversion, so I attach my modified version here. We can convert fp32 weights to fp16 directly, but I don't know how much the loss of accuracy is. It looks like fine in my simple test.

output

The predicted probability of top-5 class using fp16 and fp32

fp16:
top-5 class:  ['tiger cat', 'Egyptian cat', 'tabby, tabby cat', 'kit fox, Vulpes macrotis', 'red fox, Vulpes vulpes']
top-5 probability:  [0.2905  0.141   0.1061  0.0387  0.02692]

fp32: 
top-5 class:  ['tiger cat', 'Egyptian cat', 'tabby, tabby cat', 'kit fox, Vulpes macrotis', 'red fox, Vulpes vulpes']
top-5 probability:  [0.28581887 0.14418182 0.10869949 0.03717604 0.02565336]

code

"""
Deploy the Pretrained Model on ARM Mali GPU
=======================================================
**Author**: `Lianmin Zheng <https://lmzheng.net/>`_, `Ziheng Jiang <https://ziheng.org/>`_

This is an example of using NNVM to compile a ResNet model and
deploy it on Firefly-RK3399 with ARM Mali GPU.  We will use the
Mali-T860 MP4 GPU on this board to accelerate the inference.

This tutorial is based on the `tutorial <http://nnvm.tvmlang.org/tutorials/deploy_model_on_rasp.html>`_
for deploying on Raspberry Pi by `Ziheng Jiang <https://ziheng.org/>`_.
Great thanks to the original author, I only do several lines of modification.

To begin with, we import nnvm (for compilation) and TVM (for deployment).
"""
import tvm
import nnvm.compiler
import nnvm.testing
from tvm.contrib import util, rpc
from tvm.contrib import graph_runtime as runtime

dtype='float16'

######################################################################
# Build TVM Runtime on Device
# ---------------------------
#
# There're some prerequisites: we need build tvm runtime and set up
# a RPC server on remote device.
#
# To get started, clone tvm repo from github. It is important to clone
# the submodules along, with --recursive option (Assuming you are in
# your home directory):
#
#   .. code-block:: bash
#
#     git clone --recursive https://github.com/dmlc/tvm
#
# .. note::
#
#   Usually device has limited resources and we only need to build
#   runtime. The idea is we will use TVM compiler on the local server
#   to compile and upload the compiled program to the device and run
#   the device function remotely.
#
#   .. code-block:: bash
#
#     make runtime
#
# After success of buildind runtime, we need set environment varibles
# in :code:`~/.bashrc` file of yourself account or :code:`/etc/profile`
# of system enviroment variables. Assuming your TVM directory is in
# :code:`~/tvm` and set environment variables below your account.
#
#   .. code-block:: bash
#
#    vi ~/.bashrc
#
# We need edit :code:`~/.bashrc` using :code:`vi ~/.bashrc` and add
# lines below (Assuming your TVM directory is in :code:`~/tvm`):
#
#   .. code-block:: bash
#
#    export TVM_HOME=~/tvm
#    export PATH=$PATH:$TVM_HOME/lib
#    export PYTHONPATH=$PYTHONPATH:$TVM_HOME/python
#
# To enable updated :code:`~/.bashrc`, execute :code:`source ~/.bashrc`.

######################################################################
# Set Up RPC Server on Device
# ---------------------------
# To set up a TVM RPC server on the your ARM device (our remote device),
# we have prepared a one-line script so you only need to run this
# command after following the installation guide to install TVM on
# your device:
#
#   .. code-block:: bash
#
#     python -m tvm.exec.rpc_server --host 0.0.0.0 --port=9090
#
# After executing command above, if you see these lines below, it's
# successful to start RPC server on your device.
#
#    .. code-block:: bash
#
#      Loading runtime library /home/YOURNAME/code/tvm/lib/libtvm_runtime.so... exec only
#      INFO:root:RPCServer: bind to 0.0.0.0:9090
#

######################################################################
# For demonstration, we simply start an RPC server on the same machine,
# if :code:`use_mali` is False. If you have set up the remote
# environment, please change the three lines below: change the
# :code:`use_mali` to True, also change the :code:`host` and :code:`port`
# with your device's host address and port number.

use_mali = True
host = '10.42.0.96'
port = 9090

if not use_mali:
    # run server locally
    host = 'localhost'
    port = 9092
    server = rpc.Server(host=host, port=port, use_popen=True)

######################################################################
# Prepare the Pretrained Model
# ----------------------------
# Back to the host machine, firstly, we need to download a MXNet Gluon
# ResNet model from model zoo, which is pretrained on ImageNet. You
# can found more details about this part at `Compile MXNet Models`

from mxnet.gluon.model_zoo.vision import get_model
from mxnet.gluon.utils import download
from PIL import Image
import numpy as np

# only one line to get the model
block = get_model('mobilenet1.0', pretrained=True)

######################################################################
# In order to test our model, here we download an image of cat and
# transform its format.
img_name = 'cat.jpg'
download('https://github.com/dmlc/mxnet.js/blob/master/data/cat.png?raw=true', img_name)
image = Image.open(img_name).resize((224, 224))

def transform_image(image):
    image = np.array(image) - np.array([123., 117., 104.])
    image /= np.array([58.395, 57.12, 57.375])
    image = image.transpose((2, 0, 1))
    image = image[np.newaxis, :]
    return image

x = transform_image(image)

######################################################################
# synset is used to transform the label from number of ImageNet class to
# the word human can understand.
synset_url = ''.join(['https://gist.githubusercontent.com/zhreshold/',
                      '4d0b62f3d01426887599d4f7ede23ee5/raw/',
                      '596b27d23537e5a1b5751d2b0481ef172f58b539/',
                      'imagenet1000_clsid_to_human.txt'])
synset_name = 'synset.txt'
download(synset_url, synset_name)
with open(synset_name) as f:
    synset = eval(f.read())

######################################################################
# Now we would like to port the Gluon model to a portable computational graph.
# It's as easy as several lines.

# We support MXNet static graph(symbol) and HybridBlock in mxnet.gluon
net, params = nnvm.frontend.from_mxnet(block)
# we want a probability so add a softmax operator
net = nnvm.sym.softmax(net)

######################################################################
# Here are some basic data workload configurations.
batch_size = 1
num_classes = 1000
image_shape = (3, 224, 224)
data_shape = (batch_size,) + image_shape
out_shape = (batch_size, num_classes)

######################################################################
# Compile The Graph
# -----------------
# To compile the graph, we call the :any:`nnvm.compiler.build` function
# with the graph configuration and parameters. As we use OpenCL for 
# GPU computing, the tvm will generate both OpenCL kernel code and ARM
# CPU host code. The CPU host code is used for calling OpenCL kernels.
# In order to generate correct CPU code, we need to specify the target
# triplet for host ARM device by setting the parameter :code:`target_host`.

######################################################################
# If we run the example locally for demonstration, we can simply set
# it as :code:`llvm`. If to run it on the ARM device, you need to specify
# its instruction set. Here is the option I use for my Firefly-RK3399.

if use_mali:
    target_host = "llvm -target=aarch64-linux-gnu -mattr=+neon"
else:
    target_host = "llvm"

# convert to wanted dtype
params = {k: tvm.nd.array(v.asnumpy().astype(dtype)) for k, v in params.items()}

# set target as  `tvm.target.mali` instead of 'opencl' to enable
# target-specified optimization
graph, lib, _ = nnvm.compiler.build(net, target=tvm.target.mali(),
        shape={"data": data_shape}, target_host=target_host, dtype=dtype)

# After `nnvm.compiler.build`, you will get three return values: graph,
# library and the new parameter, since we do some optimization that will
# change the parameters but keep the result of model as the same.

# Save the library at local temporary directory.
tmp = util.tempdir()
lib_fname = tmp.relpath('net.tar')
lib.export_library(lib_fname)

######################################################################
# Deploy the Model Remotely by RPC
# --------------------------------
# With RPC, you can deploy the model remotely from your host machine
# to the remote device.

# connect the server
remote = rpc.connect(host, port)

# upload the library to remote device and load it
remote.upload(lib_fname)
rlib = remote.load_module('net.tar')

ctx = remote.cl(0)
# upload the parameter
rparams = {k: tvm.nd.array(v, ctx) for k, v in params.items()}

# create the remote runtime module
module = runtime.create(graph, rlib, ctx)
# set parameter
module.set_input(**rparams)
# set input data
module.set_input('data', tvm.nd.array(x.astype(dtype)))
# run
module.run()
# get output
out = module.get_output(0, tvm.nd.empty(out_shape, dtype=dtype, ctx=ctx))
# get top1 result
out = out.asnumpy()[0]
print("top-5 class: ",  [synset[x] for x in np.argsort(-out)[:5]])
print("top-5 probability: ", out[np.argsort(-out)][:5])

if not use_mali:
    # terminate the local server
    server.terminate()
janboeye commented 6 years ago

@merrymercy BTW, could we save the coverted fp16 params into a file for next time load? Thanks

merrymercy commented 6 years ago

Are you using python? you can save numpy array to file

janboeye commented 6 years ago

@merrymercy No, I want to deploy this fp16 params to Android platform. Is it doable?

merrymercy commented 6 years ago

I am not familiar with Android, but the weights are just floating numbers. You can save them to file and load, in the way you like. This issue https://github.com/dmlc/tvm/issues/973 may help you.

kaishijeng commented 6 years ago

@merrymercy

I have used your trick below to convert params from float32 to float16. params = {k: tvm.nd.array(v.asnumpy().astype(dtype)) for k, v in params.items()} It works OK with mobilenet. However, when I try the same method to convert tiny-yolo model to float16 and conversion is OK, but I got the following error during nnvm.compiler.build as shown below. Original float32 mode works fine. Any idea?

Traceback (most recent call last): File "./run_pc_mali_16.py", line 48, in graph, lib, params = nnvm.compiler.build(sym, tvm.target.mali(), {input_name: data_shape}, params=params, dtype=dtype, target_host=target_host) File "/usr/local/lib/python2.7/dist-packages/nnvm-0.8.0-py2.7.egg/nnvm/compiler/build_module.py", line 251, in build graph = graph.apply("GraphFusePartition").apply("GraphFuseCompile") File "/usr/local/lib/python2.7/dist-packages/nnvm-0.8.0-py2.7.egg/nnvm/graph.py", line 234, in apply check_call(_LIB.NNGraphApplyPasses(self.handle, npass, cpass, ctypes.byref(ghandle))) File "/usr/local/lib/python2.7/dist-packages/nnvm-0.8.0-py2.7.egg/nnvm/_base.py", line 72, in check_call raise NNVMError(py_str(_LIB.NNGetLastError())) nnvm._base.NNVMError: [21:20:40] HalideIR/src/ir/IROperator.h:826: Check failed: true_value.type() == false_value.type() The second and third arguments to a select do not have a matching type: tensor(ax0, ax1, ax2, ax3) has type float16 (float32(tensor(ax0, ax1, ax2, ax3))*0.100000f) has type float32

Thanks,

merrymercy commented 6 years ago

Could you share your script? Maybe some ops defined in topi have bugs in handling different dtype.

kaishijeng commented 6 years ago

See the attachment of the script. tiny-yolo2 onnx model is too big to upload. Not sure you can debug it without onnx model tmp.zip

kaishijeng commented 6 years ago

onnx model can be downloaded from the following link https://github.com/tkat0/chainer-nnvm-example/blob/master/models/YOLOv2_tiny/YOLOv2_tiny.onnx

kaishijeng commented 6 years ago

@merrymercy

Do you have a chance to reproduce this issue?

Thanks,

merrymercy commented 6 years ago

I am busy these days. I might try your script later.

kaishijeng commented 6 years ago

Thanks,

kaishijeng commented 6 years ago

@merrymercy

Any chance to look into this issue?

Thanks,

ttyang1018 commented 5 years ago

@merrymercy I tried your script on my own model.

But I get: Traceback (most recent call last): File "compile.py", line 138, in graph, lib, params = nnvm.compiler.build(sym, target, shape_dict, params=nnvm_params, target_host=target_host, dtype=dtype) File "/github/tvm/nnvm/python/nnvm/compiler/build_module.py", line 305, in build graph = graph.apply("GraphCompile") File "/github/tvm/nnvm/python/nnvm/graph.py", line 234, in apply check_call(_LIB.NNGraphApplyPasses(self.handle, npass, cpass, ctypes.byref(ghandle))) File "/github/tvm/nnvm/python/nnvm/_base.py", line 75, in check_call raise NNVMError(py_str(_LIB.NNGetLastError())) nnvm._base.NNVMError: [19:40:54] /github/tvm/src/lang/ir_operator.cc:71: Cannot match type float16 vs float32

I don't understand why would this happen. I also check my nnvm_symbol.json. No specify float32 data type in it.

Could you help me check it out?

merrymercy commented 5 years ago

Maybe some operators are hard-coded with float32 in their code. I am busy now and cannot help you.