nihui / realsr-ncnn-vulkan

RealSR super resolution implemented with ncnn library
MIT License
1.11k stars 113 forks source link

core dump on GPU H100 #49

Closed bronzafa closed 1 year ago

bronzafa commented 1 year ago

I'm trying to run realsr-ncnn-vulkan on H100 GPU and receiving core dump error.

Hardware/software config: 2 x Intel Xeon Platinum 8468 GRAPHICS: llvmpipe BAR1 / Visible vRAM: 131072 MiB OpenGL: 4.5 Mesa 20.3.3 (LLVM 11.0.0 256 bits) Display Driver: NVIDIA 530.30.02 Screen: 640x480 MEMORY: 16 x 64 GB 4800MT/s OPERATING SYSTEM: Red Hat Enterprise Linux 8.4 Kernel: 4.18.0-305.25.1.el8_4.x86_64 (x86_64) Desktop: GNOME Shell 3.32.2 Display Server: X Server 1.20.10 Compiler: GCC 8.4.1 20200928 + Clang 11.0.1 + CUDA 12.1

First used Phoronix Test Suite v10.8.4 that install pts/realsr-ncnn-1.0.0, following error is showed:

realsr-ncnn-vulkan-20200818-linux]$ ./realsr-ncnn-vulkan -i low-end-image-sample1.JPG -o out.png more than 64 cpu detected, thread affinity may not work properly :( double free or corruption (!prev) Aborted (core dumped)

Compiled the last version from github and the following error is shown:

$ sudo ./realsr-ncnn-vulkan -i /home/user/realsr-ncnn-vulkan/images/2.png -o output.png -s 4 Segmentation fault

Tried also the binaries from stable release 20220728 and the following error is shown:

realsr-ncnn-vulkan-20220728-ubuntu]$ sudo ./realsr-ncnn-vulkan -i /home/user/realsr-ncnn-vulkan/images/0.png -o output.png -s 4 double free or corruption (!prev) Aborted

I followed same installation process for Phoronix in a system with same software but using A100 GPUs and worked fine.

ArchieMeng commented 1 year ago

Could you compile it with Debug build type and run it with gdb to get its backtrace logs?

bronzafa commented 1 year ago

I updated the RHEL to version 8.8 and the problem was fixed, looks like there was something wrong with the vulkan installation.