Closed eric-haibin-lin closed 4 years ago
Yes. MobulaOP will call 'wait_to_read' for inputs and 'wait_to_write' for outputs. It's available to execute in parallel with multiple GPUs.
https://github.com/wkcn/MobulaOP/blob/master/mobula/func.py#L92
@eric-haibin-lin Sorry that I misunderstood it.
MobulaOP calls these C API of MXNet, namely MXNDArrayWaitToRead
and MXNDArrayWaitToWrite
, which breaks the asynchronous execution in MXNet and drops the performance.
Recently, I plan to change synchronous execution to asynchronous execution, by using MXTVMBridge
, however there is a problem about ABI compatibility. [issue], [discussion]
Will MXNet provide an API to visit engine->PushSync
, to implement an outside asynchronous function?
Thanks!
Hi! Current MobulaOP uses 'MXTVMBridge' to support asychronous execution.
However, there is an ABI compatibility problem since the class WrappedFunc includes 'std::function', which is implemented differently among different compilers.
I use the GCC4 header file 'functional' in MobulaOP to address the problem. However, I meet the license problem because the header filer is under GPL license.
I will remove the header file later.
Sorry, I've been busy with a few other things. I haven't read the tvm discussion thread but will do later today. Does the ABI issue indicate that mobula op and mxnet must be compiled using the same version of GCC?
@eric-haibin-lin Thank you! It is not necessary to compile MobulaOP and MXNet with the same version of GCC. We only need to keep the same implementation of 'std::function'.
It will be better when CPackedFunc is provided. We discuss the problem in https://discuss.tvm.ai/t/the-abi-compatibility-of-packedfunc/1601/15
Hi @eric-haibin-lin , I have added the TVM bridge into MobulaOP, and the ABI compatibility problem has been addressed. MobulaOP enables the asynchronous execution for MXNet by default : )
Close it : ) MobulaOP supports the asynchronous execution for MXNet (nightly build) on Windows and Linux.
This is great work. Is this tested with multi GPUs and can be executed in parallel?