pfnet-research / menoh

Menoh: fast DNN inference library with multiple programming language support
MIT License
279 stars 34 forks source link

SIMD support in Menoh #85

Open rajh619 opened 6 years ago

rajh619 commented 6 years ago

Hi I was trying Menoh vgg16 example. Does Menoh utilizes SIMD IS (like sse4, avx2 ) to speed up the inference ? If not is there an option to utilize SIMD for CPU in Menoh ?

Thanks

okdshin commented 6 years ago

Hi. Thank you for your trying. Menoh utilizes MKL-DNN and MKL-DNN supports SIMD optimization. It is automatically enabled. You needn't to set any option to enable SIMD. Could you tell us your usage? We perhaps can help you. Does it run slower than you expected?

rajh619 commented 6 years ago

Hi , I was comparing the performance with TVM compiler . Actually i tested a Resnet onnx model in both Menoh and TVM . I observed the execution time in NNVM(cpu avx2) is less when compared to Menoh( i dono which Ins.Set it uses for cpu) . I have few questions : 1) What is the execution model of menoh ? I can find only api documentation . Is there any Design/fucntional document to understand how Menoh works ?

2) Is there any options available for finetuning/optimizing the model execution .(For ex in TVM ,there are many options to optimise graphs , choose cpu or cpu-avx2 instruction set for execution etc..) .Is there any such kind for Menoh?

Thanks.

okdshin commented 6 years ago

Thank you for your important questions. I'll answer for them.

(1) Currently Menoh is an experimental framework and its implementations are incomplete. Of course, there is a design document about Menoh but sorry it is not included in the documents in this repository yet. So let me explain the design briefly here.

Menoh can be split into three parts: graph manipulation part, construction part and execution part.

In graph manipulation part, users can modify graphs loaded from ONNX (or something, maybe): deleting nodes, adding nodes and parameters, merging different models or more aggressive operations.

In construction part, multiple backends (which has specialized mechanisms to execute some operators faster. Currently there is only MKLDNN) parse partial graphs and generate procedure list. We are now planning to make backends customizable in various ways. So NNVM/TVM backend can be here.

In execution part, users can execute procedures and take outputs.

Also, we are seeking the better design to utilize DNN models outside laboratories.

(2) Currently Menoh has only very simple computation graph optimizations (e.g. Trimming useless nodes). However, now we are planning to introduce some methods to optimize the graph into the graph manipulation part, cooperating with Chainer development team.

rajh619 commented 6 years ago

Hi ,

Thankyou for your explanation. I understand ,it is in experimental phase .

Based on your explanation , I could find similar graph optimisation techniques in NNVM and accelerated execution in TVM (based on muliple backends like CPU's, Opencl/Cuda GPU's, FPGA etc).

I would like to know ,how different is Menoh from NNVM/TVM ?

Thanks

okdshin commented 6 years ago

Thank you for your another important question. NNVM/TVM is good framework and it has partially common goal with Menoh. However they also have differnt perspective about DNN inference.

Let me explain the most different part between NNVM/TVM and Menoh.

NNVM/TVM compiles trained models to dynamic libraries then applications load them and execute the operations. Model construction and execution are splitted different part. And execution part is a blackbox to users so users can not modify compiled models.

On the other hand, Menoh doesn't compile but interprets computation graphs and execute without break. It is generally slower, but simpler design than NNVM/TVM's and users can modify anywhere smoothly with coding. When we need the speed, we can also wrap NNVM/TVM by Menoh and utilize its way with Menoh C API and many language bindings. In other words, Menoh thinks it is important to cope with even customizability and usability but not only speed.

rajh619 commented 6 years ago

Thank you for your detailed information !

rajh619 commented 5 years ago

Hi @okdshin , Am trying to understand the motive behind Menoh development . Why to use Menoh if NNVM can do all the operations other than compute graphs customization ? Could you explain me an example use case ,where we modify/customize the pretrained ONNX model on the go while execution .

Thanks

okdshin commented 5 years ago

Hi @rajh619. Thanks question. Honestly speaking, modifying trained model previously, graph manipulation is not needed much. However, it is troublesome for users who want merely to use trained models distributed by another users. In addition, now I am developing model construction fallback system; when the first backend (e.g. ARMNN) failed to interpret an operator, the second backend (e.g. Generic which is naive C++ inplementation) trys to interpret. In the result, the model which utilizes multiple backends is constructed. Current ARMNN backend does not have such a fallback system, but we plans to integrate it. That fallback system will enable Menoh to interpret operators unsupported by some backend and also operators customized by users. That feature is not available by using ARMNN only.