wkcn / MobulaOP

A Simple & Flexible Cross Framework Operators Toolkit
MIT License
164 stars 21 forks source link

[Question] Using types other than float32? #31

Closed YutingZhang closed 5 years ago

YutingZhang commented 5 years ago

Is there a way to specify the data type of the outputs (other than always using float32)?

And, in general, does MobulaOP support mixed types when implementing a kernel?

Thanks!

wkcn commented 5 years ago

Yes. You can specify the data type and use mixed types using C++ template, and there is no limitation for the number of template types. :)

It is available to infer template types according to the types of inputs in MobulaOP

For example, in this example, you can change the data types of variables 'a' and 'b'.

Like that:

a = mx.nd.array([1, 2, 3], dtype='int')
b = mx.nd.array([4, 5, 6], dtype='int')
YutingZhang commented 5 years ago

@wkcn That's awesome! I find one potential problem when more than one type are used in the template like this:

template <typename T1, typename T2>
MOBULA_KERNEL some_op_kernel(const int n, const T1* a, const T2* b, T1* c) {
    ...
}

When calling it with the same type for T1 and T2, it works fine. But when I call it with T1=float32, T2=int32, it gives me an error like:

terminate called after throwing an instance of 'dmlc::Error'
  what():  [13:26:53] src/operator/custom/custom.cc:347: Check failed: reinterpret_cast<CustomOpFBFunc>( params.info->callbacks[
kCustomOpForward])( ptrs.size(), const_cast<void**>(ptrs.data()), const_cast<int*>(tags.data()), reinterpret_cast<const int*>(re
q.data()), static_cast<int>(ctx.is_train), params.info->contexts[kCustomOpForward])
wkcn commented 5 years ago

Thanks for your report! It seems that the Python code raises an error when forwarding. I will check it.

wkcn commented 5 years ago

Sorry, I couldn't reproduce the issue Code. Could you please provide the minimum reproducible example?

Thank you!

YutingZhang commented 5 years ago

@wkcn You are right. This is not a bug. However, there is actually another bug. When I test the example, I actually trigger the other bug, which confused me ...

This bug is weird. If you run MobulaOP code in /tmp (PyCharm remote debugging does this), it can give errors like

Error in CustomOp.forward: Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/operator.py", line 987, in forward_entry
    aux=tensors[4])
  File "/mnt/efs/_libraries/MobulaOP/mobula/glue/mx.py", line 66, in forward
    out = self._forward(*in_data)
  File "./AdditionOP/AdditionOP.py", line 8, in forward
    mobula.func.addition_op_forward(a.size, a, b, c)
  File "/mnt/efs/_libraries/MobulaOP/mobula/func.py", line 141, in __call__
    dev_id=dev_id)
  File "/mnt/efs/_libraries/MobulaOP/mobula/func.py", line 55, in __call__
    func = self.loader(self, arg_types, ctx, **self.loader_kwargs)
  File "/mnt/efs/_libraries/MobulaOP/mobula/op/loader.py", line 420, in op_loader
    cpp_info.load_dll(dll_fname)
  File "/mnt/efs/_libraries/MobulaOP/mobula/op/loader.py", line 193, in load_dll
    self.dll = ctypes.CDLL(dll_fname)
  File "/home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/ctypes/__init__.py", line 348, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: ./AdditionOP/build/AdditionOP_cpu_0.so: failed to map segment from shared object

terminate called after throwing an instance of 'dmlc::Error'
  what():  [20:31:51] src/operator/custom/custom.cc:347: Check failed: reinterpret_cast<CustomOpFBFunc>( params.info->callbacks[kCustomOpForward])( ptrs.size(), const_cast<void**>(ptrs.data()), const_cast<int*>(tags.data()), reinterpret_cast<const int*>(req.data()), static_cast<int>(ctx.is_train), params.info->contexts[kCustomOpForward])

Stack trace returned 8 entries:
[bt] (0) /home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x3ee02a) [0x7fbc5738d02a]
[bt] (1) /home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x3ee651) [0x7fbc5738d651]
[bt] (2) /home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x750db9) [0x7fbc576efdb9]
[bt] (3) /home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x768bc0) [0x7fbc57707bc0]
[bt] (4) /home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x7517b8) [0x7fbc576f07b8]
[bt] (5) /home/ubuntu/anaconda3/envs/mxnet_p36/bin/../lib/libstdc++.so.6(+0xb8678) [0x7fbc9419e678]
[bt] (6) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7fbc9b2486ba]
[bt] (7) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fbc9af7e41d]

This error can be replicated by any mobulaOP code, such as examples/dynamic_import_op. Just copy and paste the folder to /tmp. cd to the new folder, and run it. It then gives you the error.

My environment is AWS p3.xlarge, Ubuntu 16.04, Deep Learning AMI, mxnet_p36

wkcn commented 5 years ago

@YutingZhang Thank you! I will check it.

wkcn commented 5 years ago

MobulaOP will build a dynamic link library file in the Custom Operator directory. Could you try sudo mount /tmp -o remount,exec? It seems /tmp doesn't own execution right.

YutingZhang commented 5 years ago

Exactly! Thank you! By the way, is it possible to specify the build folder for the so files? (not very critical anyway)

wkcn commented 5 years ago

Thanks for your suggestion :) I will add the feature later. It need to consider how to specify it, such as how to set, global shared path or local path for an individual project.

YutingZhang commented 5 years ago

Thank you @wkcn !!