taichi-dev / taichi

Productive, portable, and performant GPU programming in Python.
https://taichi-lang.org
Apache License 2.0
25.51k stars 2.28k forks source link

Generating a C++ kernel for distribution #920

Open Mohamed-Sakr opened 4 years ago

Mohamed-Sakr commented 4 years ago

Hi,

I'm a C++ dev doing plugins for Cinema 4D, I wanted to use Taichi to create the ground blocks for a new simulation engine similar to Taichi elements Blender repo.

problem is: Python can't be distributed commercially to users.

I know that the Python layer is translated to a final kernel later inside the code.

so what I have in mind as a final image: (these are the steps) 1- something similar to how Cycles render engine in Blender works, it has a kernel (written or generated), and this kernel is saved to a folder, with known arguments which gets called at runtime giving the developer more control. 2- lets consider the dev wrote a full simulation engine with Python, now he wants to get a final compiled C++ executable as .dll/.so with known interface for the generated functions

not sure how far this is doable or not, please reply with your thoughts about this.

cheers, Mohamed Sakr

archibate commented 4 years ago

Cool idea! Thank for purposing this. Since taichi uses LLVM as one of it's backend, so it should be possible to generate a .dll/.so file by the LLVM IR. Also note that there is already issue about this: #439

Mohamed-Sakr commented 4 years ago

Cool idea! Thank for purposing this. Since taichi uses LLVM as one of it's backend, so it should be possible to generate a .dll/.so file by the LLVM IR. Also note that there is already issue about this: #439

Thanks! didn't notice this thread #439 any expected time for this to be available? I may help with this if needed, just need to know which flags do you use when compiling LLVM, and which LLVM version. (I tried the precompiled one from Blender libs but flags are different)

BTW I'm the main dev of Cycles 4D plugin bridge.

cheers, Mohamed Sakr

yuanming-hu commented 4 years ago

Quick answer for now: this would be a very useful feature and a lot of people are asking for it. However, currently, our community members are mostly busy with polishing what we already have (adding documentation, refactoring, making small usability improvements, etc). So we can't promise a recent timeline on this.

If the goal is to remove the dependency on Python, then it shouldn't be too hard to implement. We just need to rip off the Python part and ship pre-processed frontend ASTs and the Taichi runtime (libtaichi.so). If you would like to join us and work on this, we are more than happy to welcome you and provide necessary help :-)

Developer installing guide (with LLVM details): https://taichi.readthedocs.io/en/latest/dev_install.html

Also note that only shipping compiled kernels won't go very far, since you need the Taichi runtime (e.g. CPU threadpool, CUDA module loader) for more complex kernels. You may want to include the JIT part of Taichi for CPU/GPU compatibility if you don't want to write an NVPTX loader on your own... So I suggest releasing the Taichi runtime + JIT engine + certain Taichi kernel representation together.

archibate commented 4 years ago

1332 could be a systematic solution for this.

Note: Currently the experimental C backend is already capable of creating .so, which could be later linked into user programs.

Mohamed-Sakr commented 4 years ago

Thanks, this is great!!. unfortunately I got very busy with a few projects that I couldn't proceed with compiling kernels.

archibate commented 4 years ago

Hi! I'd like to share you with our latest progress in #1629: Now, we can export a Taichi program into a single C source file! Here's the workflow:

  1. Run TI_ARCH=cc TI_ACTION_RECORD=mpm88.yml python examples/mpm88.py. Close the GUI window once particles are shown correctly. This will save all the kernels in mpm88.py to mpm88.yml:
- action: "compile_kernel"
  kernel_name: "init_c6_0"
  kernel_source: "void Tk_init_c6_0(struct Ti_Context *ti_ctx) {\n  for (Ti_i32 tmp0 = 0; tmp0 < 8192...\n"
- action: "launch_kernel"
  kernel_name: "init_c6_0"
...
  1. Run ti cc_compose mpm88.yml mpm88.c, this will compose the kernels and runtimes in mpm88.yml into a single C file mpm88.c:
...

Ti_i8 Ti_gtmp[1048576];
union Ti_BitCast Ti_args[8];
Ti_i32 Ti_earg[8 * 8];

struct Ti_Context Ti_ctx = {  // statically-allocated context for convenience!
  &Ti_root, Ti_gtmp, Ti_args, Ti_earg,
};

void Tk_init_c6_0(struct Ti_Context *ti_ctx) {
  for (Ti_i32 tmp0 = 0; tmp0 < 8192; tmp0 += 1) {
    Ti_i32 tmp1 = tmp0;
    Ti_f32 tmp2 = Ti_rand_f32();
    Ti_f32 tmp3 = Ti_rand_f32();
    Ti_f32 tmp4 = 0.4;
    Ti_f32 tmp5 = tmp2 * tmp4;

    ...
  1. Then, link this file together with your C/C++ project.

To call init_c6_0, for example:

extern struct Ti_Context Ti_ctx;
extern "C" void Tk_init_c6_0(struct Ti_Context *ti_ctx);
...
Tk_init_c6_0(&Ti_ctx);

Or, if you need multiple Taichi context within one program:

class MyRenderer {
  ...
  struct Ti_Context per_renderer_taichi_context;
  ...
};

MyRenderer::MyRenderer() {
  per_renderer_taichi_context.root = malloc(...);
  ...
  Tk_init_c6_0(&per_renderer_taichi_context);
}

A full documentation on this feature will be added after C backend is officially released.

Mohamed-Sakr commented 4 years ago

@archibate Thanks! this is quite interesting, I will try to give it a shot soon.