taichi-dev / taichi

Productive, portable, and performant GPU programming in Python.
https://taichi-lang.org
Apache License 2.0
25.35k stars 2.27k forks source link

atomic_max with negative inputs gives 0.0 as output for CPU/CUDA backend (only) #7735

Closed oliver-batchelor closed 1 year ago

oliver-batchelor commented 1 year ago

The code below prints -1.0 for ti.vulkan backend (as expected), but 0.0 for ti.cuda or ti.cpu backend.

import taichi as ti
ti.init(arch=ti.cuda)

@ti.kernel
def max_kernel() -> ti.f32:
  x = -1000.0
  for i in range(1, 20):
    ti.atomic_max(x, -ti.f32(i))

  return x
print(f"{max_kernel():.2f}")

Additional comments

ti diagnose main [Taichi] version 1.5.0, llvm 15.0.4, commit 7b885c28, linux, python 3.8.16


Taichi Programming Language


Docs: https://docs.taichi-lang.org/ GitHub: https://github.com/taichi-dev/taichi/ Forum: https://forum.taichi.graphics/

Taichi system diagnose:

python: 3.8.16 (default, Jan 17 2023, 23:13:24) [GCC 11.2.0] system: linux executable: /local/mambaforge/envs/conan_ros/bin/python platform: Linux-5.14.0-1059-oem-x86_64-with-glibc2.17 architecture: 64bit ELF uname: uname_result(system='Linux', node='cs23004lv', release='5.14.0-1059-oem', version='#67-Ubuntu SMP Mon Mar 13 14:22:10 UTC 2023', machine='x86_64', processor='x86_64') locale: en_NZ.UTF-8 PATH: /local/mambaforge/envs/conan_ros/share/rubygems/bin:/local/mambaforge/envs/conan_ros/bin:/local/mambaforge/condabin:/csse/users/owb14/.local/bin:/usr/local/cuda/bin:/opt/spinnaker/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/csse/misc/android-sdk-linux/tools:/csse/misc/android-sdk-linux/platform-tools:/csse/users/owb14/.dotnet/tools:/usr/local/gradle-7.3.1/bin:/usr/local/jazzer:/usr/local/maven-3.8.4/bin:/usr/share/java/javafx-sdk-17.0.1:/usr/local/sqlmap-1.6.8:/csse/users/owb14/.local/bin:/csse/users/owb14/bin:/csse/misc/bin PYTHONPATH: ['/local/mambaforge/envs/conan_ros/bin', '/local/mambaforge/envs/conan_ros/lib/python38.zip', '/local/mambaforge/envs/conan_ros/lib/python3.8', '/local/mambaforge/envs/conan_ros/lib/python3.8/lib-dynload', '/local/mambaforge/envs/conan_ros/lib/python3.8/site-packages']

No LSB modules are available. Distributor ID: Linuxmint Description: Linux Mint 20.3 Release: 20.3 Codename: una

import: <module 'taichi' from '/local/mambaforge/envs/conan_ros/lib/python3.8/site-packages/taichi/init.py'>

cc: False cpu: True metal: False opengl: True cuda: True vulkan: True

OpenGL version 4.6.0 NVIDIA 530.30.02 is supported GL_ARB_compute_shader: OK GL_ARB_gpu_shader_int64: OK GL_NV_shader_atomic_float: OK GL_NV_shader_atomic_float64: OK GL_NV_shader_atomic_int64: OK

Tue Apr 4 17:10:21 2023
+---------------------------------------------------------------------------------------+ | NVIDIA-SMI 530.30.02 Driver Version: 530.30.02 CUDA Version: 12.1 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce RTX 3090 On | 00000000:21:00.0 Off | N/A | | 30% 48C P0 115W / 350W| 577MiB / 24576MiB | 5% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 5419 G /usr/lib/xorg/Xorg 179MiB | | 0 N/A N/A 6586 G /usr/bin/gnome-shell 190MiB | | 0 N/A N/A 7046 G ...ures=SpareRendererForSitePerProcess 15MiB | | 0 N/A N/A 7103 G ...,WinRetrieveSuggestionsOnlyOnDemand 67MiB | | 0 N/A N/A 11214 G ...sion,SpareRendererForSitePerProcess 61MiB | | 0 N/A N/A 14511 G ...875352,131072 --enable-features=JXL 59MiB | +---------------------------------------------------------------------------------------+

[Taichi] version 1.5.0, llvm 15.0.4, commit 7b885c28, linux, python 3.8.16

[Taichi] version 1.5.0, llvm 15.0.4, commit 7b885c28, linux, python 3.8.16 [Taichi] Starting on arch=x64

[Taichi] version 1.5.0, llvm 15.0.4, commit 7b885c28, linux, python 3.8.16 [Taichi] Starting on arch=opengl

[Taichi] version 1.5.0, llvm 15.0.4, commit 7b885c28, linux, python 3.8.16 [Taichi] Starting on arch=cuda

[Taichi] version 1.5.0, llvm 15.0.4, commit 7b885c28, linux, python 3.8.16


Taichi Programming Language


Docs: https://docs.taichi-lang.org/ GitHub: https://github.com/taichi-dev/taichi/ Forum: https://forum.taichi.graphics/

                               TAICHI EXAMPLES                                    

──────────────────────────────────────────────────────────────────────────────────── 0: ad_gravity 25: laplace 50: physarum
1: circle_packing_image 26: laplace_equation 51: poisson_disk_sampling
2: comet 27: mandelbrot_zoom 52: print_offset
3: cornell_box 28: marching_squares 53: rasterizer
4: diff_sph 29: mass_spring_3d_ggui 54: regression
5: euler 30: mass_spring_game 55: sdf_renderer
6: explicit_activation 31: mass_spring_game_ggui 56: simple_derivative
7: export_mesh 32: mciso_advanced 57: simple_texture
8: export_ply 33: mgpcg 58: simple_uv
9: export_videos 34: mgpcg_advanced 59: snow_phaseField
10: fem128 35: minimal 60: stable_fluid
11: fem128_ggui 36: minimization 61: stable_fluid_ggui
12: fem99 37: mpm128 62: stable_fluid_graph
13: fractal 38: mpm128_ggui 63: taichi_bitmasked
14: fractal3d_ggui 39: mpm3d 64: taichi_dynamic
15: fullscreen 40: mpm3d_ggui 65: taichi_logo
16: game_of_life 41: mpm88 66: taichi_ngp
17: gui_image_io 42: mpm88_graph 67: taichi_sparse
18: gui_widgets 43: mpm99 68: texture_graph
19: implicit_fem 44: mpm_lagrangian_forces 69: tutorial
20: implicit_mass_spring 45: nbody 70: two_stream_instability
21: initial_value_problem 46: odop_solar 71: vortex_rings
22: jacobian 47: oit_renderer 72: waterwave
23: karman_vortex_street 48: patterns
24: keyboard 49: pbf2d
──────────────────────────────────────────────────────────────────────────────────── 42 Running example minimal ... [Taichi] Starting on arch=x64 42.0

Running time: 0.35s

Consider attaching this log when maintainers ask about system information.

Running time: 7.91s

lin-hitonami commented 1 year ago

This is because our get_min_value function is not implemented correctly. It returns std::numeric_limits<float32>::min() for f32 but this actually returns the minimum normalized positive value of f32. https://github.com/taichi-dev/taichi/blob/5679ee282e196d2c18f7eaf6fadf56fb999fc4df/taichi/ir/type_utils.h#L162-L186

We should use std::numeric_limits<T>::lowest() instead.