taichi-dev / taichi

Productive, portable, and performant GPU programming in Python.
https://taichi-lang.org
Apache License 2.0
25.56k stars 2.29k forks source link

Segmentation fault when using GPU in supercomputer #6727

Open mushroomfire opened 2 years ago

mushroomfire commented 2 years ago

Environment: Paratera supercomputer

submit scipt:

#!/bin/bash
#SBATCH -N 1
#SBATCH -n 5
#SBATCH -p gpu 
#SBATCH --gres=gpu:1
#SBATCH --no-requeue

nvidia-smi

python test.py

Here is the test.py:

import taichi as ti
ti.init(ti.cuda)

The output file is as below:

Thu Nov 24 21:48:39 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.73.08    Driver Version: 510.73.08    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000000:8A:00.0 Off |                    0 |
| N/A   23C    P0    42W / 300W |      0MiB / 32768MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
/tmp/slurmd/job604193/slurm_script: line 10: 143192 Segmentation fault      python test.py

I don't know how to sovle this Segmentation fault error. If you need more detail information, please let me know. Thanks a lot.

ailzhang commented 2 years ago

Hey @mushroomfire , OOC does this repro if you run it directly on a V100 without slurs? Thanks!

mushroomfire commented 2 years ago

Hey @ailzhang, here is the results if I run script directly in shell: python test.py

[Taichi] version 1.2.2, llvm 10.0.0, commit 608e4b57, linux, python 3.8.0
[Taichi] Starting on arch=cuda
Segmentation fault