taichi-dev / taichi

Productive, portable, and performant GPU programming in Python.
https://taichi-lang.org
Apache License 2.0
25.05k stars 2.26k forks source link

Taichi get slower after about 500 loops #8513

Open hairuiC opened 2 months ago

hairuiC commented 2 months ago

I got a problem: I need to render over 10 thousand images using taichi. At first 500 loops it's really fast about 3e-5s each image. But after that it get slow. The time is about 0.01~0.02s each loop. The image size is 1024 * 1024 and spp is 32. I'm a starter of taichi and can't figure out the reason. I have tried my program on different GPUs: 2080Ti and 3090. They are facing the same situation.

Any body met this problem and can some body give any advice?

`import random

import taichi as ti import numpy as np import argparse from time import time from ray_tracing_models import calibration import math import torch import gc from ray_tracing_models import Ray, Camera, Hittable_list, PI, random_in_unit_sphere, refract, reflect, reflectance, random_unit_vector, Plane, local2world ti.init(arch=ti.gpu, device_memory_GB=11, device_memory_fraction=1.0)

image_width = 2048 image_height = 2048 canvas = ti.Vector.field(1, dtype=ti.f32, shape=(image_width, image_height)) random_num = 10000 rand_x_z = ti.field(ti.f64, shape=random_num)

sigma = 0.0 _rand_x_z = np.random.normal(loc=0., scale=sigma, size=random_num) rand_x_z.from_numpy(_rand_x_z) x_board = [3022, 8002] y_board = [1172, 9002] x_rand = np.random.uniform(x_board[0], x_board[1], random_num) y_rand = np.random.uniform(y_board[0], y_board[1], random_num) samples_per_pixel = 64 ray_num = samples_per_pixel image_width image_height print(ray_num) max_depth = 2

@ti.kernel def render(): for ray_no in range(ray_num): pixel_n = int(ray_no % (image_width * image_height)) row_num = int(pixel_n / image_width) col_num = int(pixel_n % image_width) color = ti.Vector([0.0]) u = (row_num + ti.random()) / image_width v = (col_num + ti.random()) / image_height ray = camera.get_ray(u, v) color = ray_color(ray, ray_no) color /= samples_per_pixel

    ti.atomic_add(canvas[row_num, col_num],color)

@ti.func def ray_color(ray, ray_no): color_buffer = ti.Vector([0.0]) brightness = ti.Vector([1.0]) scattered_origin = ray.origin scattered_direction = ray.direction p_RR = 1.0 for n in range(max_depth): if ti.random() > p_RR: break is_hit, hit_point, hit_point_normal, front_face, material, color = scene.hit(Ray(scattered_origin, scattered_direction)) if is_hit: if material == 0 and front_face:

            color_buffer = color * brightness
            break
        else:
                TBN = local2world(hit_point_normal)
                temp_normal = ti.math.normalize(ti.Vector([rand_x_z[int((ray_no)%random_num)], 1.0, rand_x_z[int((ray_no)%random_num)]]))

                new_normal = TBN@temp_normal
                scattered_direction = ti.math.reflect(scattered_direction, new_normal)
                scattered_origin = hit_point
                brightness *= color

    else:
        color_buffer = ti.Vector([0.8])
return color_buffer

if name == "main": for i in range(10000):

rand_x = x_rand[k]

    # rand_y = y_rand[k]
    # i = 100
    random_point_pos = ti.Vector([x_rand[i], y_rand[i]])
    sigma_k = sigma
    render_part = ti.math.vec4([random_point_pos[0] - 100,
                             random_point_pos[1] - 100,
                             random_point_pos[0] + 100,
                             random_point_pos[1] + 100])
    camera = Camera()
    lumitexel = []
    scene = Hittable_list()
    scene.from_obj("mirror.obj", mtl=1)

    for j in range(10240):
        a = time()

        scene.from_obj("single_light/light_{}.obj".format(j), mtl=0)
        canvas.fill(0)

        render()

        scene.clear()
        lumitexel.append(canvas[int(x_rand[i]), int(y_rand[i])])
        b = time()
        print((b - a), j)
    print(np.max(lumitexel))

`

bobcao3 commented 1 week ago

Is the loop creating / declaring more fields? Fields are global and are not GCed.

bobcao3 commented 1 week ago

Also in recent versions of taichi you shouldn't need to declare device_memory_GB=11, device_memory_fraction=1.0 unless you are using sparse SNodes