wkcn / MobulaOP

A Simple & Flexible Cross Framework Operators Toolkit
MIT License
164 stars 21 forks source link

Implementation ideas of creating Operator #91

Closed a550461053 closed 4 years ago

a550461053 commented 4 years ago

May I ask what implementation of creating operator in our MobulaOP:

  1. use the TVM PackedFunc to register function
  2. use the method of MXAPI And, why not use the nnvm to register operator?like:https://mxnet.apache.org/api/faq/new_op Also, are you thinking about the performance?
wkcn commented 4 years ago

Hi @a550461053 , MobulaOP uses MXNet Python API to create operators https://github.com/wkcn/MobulaOP/blob/master/mobula/glue/mx.py#L133, and uses TVMBridge to register asychronous functions.

  1. TVM PackedFunc It is simple to use TVMBridge to register asychronous functions. To address the issue of ABI compatibility, I moved tvm_bridge.h into the 3rdparty, and call the API MXEnginePushSyncND. This method doesn't need to rebuild MXNet.

  2. NNVM API If using NNVM to register operators, it needs to rebuild MXNet.

  3. Performance The overheads of MobulaOP are on the Python code and the implementation of mx.sym.CustomOp, which uses multi-thread to execute each registered op. Since MobulaOP enables asynchonous computation, the time on Python Code will be covered by the computational time.

There is a better approach to register custom op now: https://github.com/apache/incubator-mxnet/tree/master/example/extensions/lib_custom_op

a550461053 commented 4 years ago

Hi @a550461053 , MobulaOP uses MXNet Python API to create operators https://github.com/wkcn/MobulaOP/blob/master/mobula/glue/mx.py#L133, and uses TVMBridge to register asychronous functions.

  1. TVM PackedFunc It is simple to use TVMBridge to register asychronous functions. To address the issue of ABI compatibility, I moved tvm_bridge.h into the 3rdparty, and call the API MXEnginePushSyncND. This method doesn't need to rebuild MXNet.
  2. NNVM API If using NNVM to register operators, it needs to rebuild MXNet.
  3. Performance The overheads of MobulaOP are on the Python code and the implementation of mx.sym.CustomOp, which uses multi-thread to execute each registered op. Since MobulaOP enables asynchonous computation, the time on Python Code will be covered by the computational time.

There is a better approach to register custom op now: https://github.com/apache/incubator-mxnet/tree/master/example/extensions/lib_custom_op

Thank you very much, I update my mxnet to 1.6.0b20191102 and try the approach: https://github.com/apache/incubator-mxnet/tree/master/example/extensions/lib_custom_op return the error:

MXNet version 10500 supported
--------start ndarray compute---------
Traceback (most recent call last):
  File "test_gemm.py", line 41, in <module>
    print(mx.nd.my_gemm(a,b))
AttributeError: module 'mxnet.ndarray' has no attribute 'my_gemm'

I also see that objdump: /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so: File format not recognized. I will make a issue to MXNet. But when I use MobulaOP, it's ok. Can you explain why it happen and what's different between MobulaOP and MXNet's new approach of registering custom op without rebuilding MXNet.

wkcn commented 4 years ago
  1. MXNet's new approach of registering Custom Op The new approach was introduced in Dec. 6, 2019, so MXNet 1.6.0b20191102 is not supported. It works for MXNet whose version>=1.6.0b20191207

  2. objdump error Sorry that I couldn't reproduce the issue.

  3. what's different between MobulaOP and MXNet's new approach of registering custom op MobulaOP was written before MXNet's new approach. MobulaOP uses the MXNet API MXEnginePushSyncND and the Python API mx.sym.CustomOp to register operators. The overhead is on the Python code of MobulaOP and MXNet Python CustomOp. In addition, MXNet Python CustomOp uses multi-thread to execute each Op. The benefit is that it is easier to write code and call other MXNet built-in Op, due to using mx.sym.CustomOp. MXNet's new approach provides a C API to register custom operator, and the whole procedure is written by C++. It is suitable for deployment and faster than MobulaOP.'

These two approaches don't need to rebuild MXNet, and there is no ABI compatibility problem.

a550461053 commented 4 years ago
  1. MXNet's new approach of registering Custom Op The new approach was introduced in Dec. 6, 2019, so MXNet 1.6.0b20191102 is not supported. It works for MXNet whose version>=1.6.0b20191207
  2. objdump error Sorry that I couldn't reproduce the issue.
  3. what's different between MobulaOP and MXNet's new approach of registering custom op MobulaOP was written before MXNet's new approach. MobulaOP uses the MXNet API MXEnginePushSyncND and the Python API mx.sym.CustomOp to register operators. The overhead is on the Python code of MobulaOP and MXNet Python CustomOp. In addition, MXNet Python CustomOp uses multi-thread to execute each Op. The benefit is that it is easier to write code and call other MXNet built-in Op, due to using mx.sym.CustomOp. MXNet's new approach provides a C API to register custom operator, and the whole procedure is written by C++. It is suitable for deployment and faster than MobulaOP.'

These two approaches don't need to rebuild MXNet, and there is no ABI compatibility problem.

Thanks, the MobulaOP use C++ to define all kernel function, the python code's overhead is nothing for the use of MXEnginePushSyncND, and the only overhead is the MXNet Python CustomOp. Does it right?

wkcn commented 4 years ago

Yes. The overhead includes MXNet Python CustomOp, and some preprocessing of MobulaOp (e.g. check the input type, find the custom function).

a550461053 commented 4 years ago

Thank you very much!