When generating an AOT module using Taichi, I observed a difference in the size of the generated module.tcm file depending on whether the kernel function was executed before archiving. This discrepancy also affects the runtime efficiency when the module is loaded and launched from C++/ C#.
Minimal Sample Code to Reproduce
import taichi as ti
def compile_aot(run=False):
ti.init(arch=ti.vulkan)
if ti.lang.impl.current_cfg().arch != ti.vulkan:
raise RuntimeError("Vulkan is not available.")
@ti.kernel
def paint(pixels: ti.types.ndarray(dtype=ti.f32, ndim=2), n: ti.u32, t: ti.f32):
for i, j in pixels: # Parallelized over all pixels
c = ti.Vector([-0.8, ti.cos(t) * 0.2])
z = ti.Vector([i / n - 1, j / n - 0.5]) * 2
iterations = 0
while z.norm() < 20 and iterations < 50:
z = ti.Vector([z[0]**2 - z[1]**2, z[1] * z[0] * 2]) + c
iterations += 1
pixels[i, j] = 1 - iterations * 0.02
n = 1024
t = 0
pixels = ti.ndarray(shape=(n * 2, n), dtype=ti.f32)
if run:
gui = ti.GUI('Julia Set', (n * 2, n))
while gui.running:
t += 1
paint(pixels, n, t * 0.03)
pixel = pixels.to_numpy()
gui.set_image(pixel)
gui.show()
mod = ti.aot.Module(ti.vulkan)
mod.add_kernel(paint, template_args={'pixels': pixels})
mod.archive("build/module.tcm")
print("Module archived to 'build/module.tcm'")
if __name__ == '__main__':
compile_aot(run=False)
Observations
When run=False, the generated module.tcm file is 7 KB.
When run=True, the generated module.tcm file is 8 KB.
The runtime efficiency when the module is called from C++/C# differs between the two cases.(If the kernel function was executed before archiving, it runs faster)
Questions
What causes the difference in the size of the AOT module depending on whether the kernel function is executed before archiving?
Why does this difference impact the runtime efficiency when the module is called from C++?
System Information
Taichi version: [1.8.0]
OS: [WIN 11]
Thank you for your help in understanding this issue.
Issue Description
Summary
When generating an AOT module using Taichi, I observed a difference in the size of the generated module.tcm file depending on whether the kernel function was executed before archiving. This discrepancy also affects the runtime efficiency when the module is loaded and launched from C++/ C#.
Minimal Sample Code to Reproduce
Observations
When run=False, the generated module.tcm file is 7 KB. When run=True, the generated module.tcm file is 8 KB.
The runtime efficiency when the module is called from C++/C# differs between the two cases.(If the kernel function was executed before archiving, it runs faster)
Questions
What causes the difference in the size of the AOT module depending on whether the kernel function is executed before archiving? Why does this difference impact the runtime efficiency when the module is called from C++?
System Information
Taichi version: [1.8.0] OS: [WIN 11] Thank you for your help in understanding this issue.