nihui / waifu2x-ncnn-vulkan

waifu2x converter ncnn version, runs fast on intel / amd / nvidia / apple-silicon GPU with vulkan
MIT License
3.01k stars 210 forks source link

Not working scale multiple of x4 or more. #138

Closed ghost closed 3 years ago

ghost commented 3 years ago

Like #137, there is a multiple of x4 or more in the scale command, but the image is not output.

svelle commented 3 years ago

Yeah same here, tested this with a friend.

On my system it fails silentely (windows 10 with rtx 3080, ryzen 5 3600X and 32gb ram) On his system it coredumps (ubuntu 20.04 with intel i7 9700U)

> ./waifu2x-ncnn-vulkan -i ~/Desktop/first_floor.png -o ~/Desktop/first_floor_enhanced.png -n 2 -s 4
MESA-INTEL: warning: Performance support disabled, consider sysctl dev.i915.perf_stream_paranoid=0

[0 Intel(R) UHD Graphics 620 (KBL GT2)]  queueC=0[1]  queueG=0[1]  queueT=0[1]
[0 Intel(R) UHD Graphics 620 (KBL GT2)]  bugsbn1=0  bugbilz=0  bugcopc=0  bugihfa=0
[0 Intel(R) UHD Graphics 620 (KBL GT2)]  fp16-p/s/a=1/1/1  int8-p/s/a=1/1/1
[0 Intel(R) UHD Graphics 620 (KBL GT2)]  subgroup=32  basic=1  vote=1  ballot=1  shuffle=1
double free or corruption (out)
Aborted (core dumped)
pcroland commented 3 years ago

It's silent too on my system.

odakaui commented 3 years ago

I am having a similar problem, however, I do get an output file.

On my Debian Buster system (AMD Ryzen 5 2600, RX570), I get the following:

~/downloads/waifu2x-ncnn-vulkan-20210210-ubuntu/waifu2x-ncnn-vulkan -i xxxx.jpg -o xxxx_4x.png -n2 -s4
[0 AMD RADV POLARIS10 (LLVM 7.0.1)]  queueC=1[8]  queueG=0[1]  queueT=0[1]
[0 AMD RADV POLARIS10 (LLVM 7.0.1)]  bugsbn1=0  bugbilz=0  bugcopc=0  bugihfa=0
[0 AMD RADV POLARIS10 (LLVM 7.0.1)]  fp16-p/s/a=1/1/0  int8-p/s/a=1/0/0
[0 AMD RADV POLARIS10 (LLVM 7.0.1)]  subgroup=64  basic=1  vote=1  ballot=1  shuffle=1
[1]    29289 segmentation fault  ~/downloads/waifu2x-ncnn-vulkan-20210210-ubuntu/waifu2x-ncnn-vulkan -i  -o
thesandwichman294 commented 3 years ago

For me 4x works fine, its at 8x when there is a "Segmentation fault (core dumped)" although an output is created.

Running Manjaro (using vfio gpu passthrough) on AMD Ryzen 7 2700X RX 470

$ waifu2x-ncnn-vulkan -i screenshot2.png -o waifu2x-n3-s8.png -f png -n 3 -s 8
[0 AMD RADV POLARIS10 (ACO)]  queueC=1[4]  queueG=0[1]  queueT=0[1]
[0 AMD RADV POLARIS10 (ACO)]  bugsbn1=0  bugbilz=0  bugcopc=0  bugihfa=0
[0 AMD RADV POLARIS10 (ACO)]  fp16-p/s/a=1/1/0  int8-p/s/a=1/1/1
[0 AMD RADV POLARIS10 (ACO)]  subgroup=64  basic=1  vote=1  ballot=1  shuffle=1
Segmentation fault (core dumped)
svelle commented 3 years ago

Interestingly enough on my AMD R7 4750U with 48GB of RAM and Fedora 33 it works without any issues up to 8x scaling. Although I should probably try a more complicated image.

I'm relatively certain that it has something to do with memory.

abertay-university commented 3 years ago

I'm getting the following error on Mac OS 11:

waifu2x-ncnn-vulkan(5418,0x700010c83000) malloc: *** error for object 0x7fee68d31010: pointer being freed was not allocated
waifu2x-ncnn-vulkan(5418,0x700010c83000) malloc: *** set a breakpoint in malloc_error_break to debug
zsh: abort      ./waifu2x-ncnn-vulkan -i  -o  -s 4 -n 3
xmontex commented 3 years ago

try writing the whole word scale instead of -s

koumaza commented 3 years ago

@xmontex try writing the whole word scale instead of -s

But if it used this method, will be uses default value...

utybo commented 3 years ago

Compiling with /fsanitize=address on Windows gives me this:

=================================================================
==2096==ERROR: AddressSanitizer: attempting double-free on 0x1299c4db1800 in thread T12:
    #0 0x7ff756230595 in _aligned_free D:\agent\_work\10\s\src\vctools\crt\asan\llvm\compiler-rt\lib\asan\asan_malloc_win.cc:199
    #1 0x7ff75636626a in ncnn::fastFree C:\Users\matth\random\waifu2x-ncnn-vulkan\src\ncnn\src\allocator.h:91
    #2 0x7ff75604a194 in ncnn::Mat::release(void) C:\Users\matth\random\waifu2x-ncnn-vulkan\src\ncnn\src\mat.h:1326
    #3 0x7ff756049702 in ncnn::Mat::~Mat(void) C:\Users\matth\random\waifu2x-ncnn-vulkan\src\ncnn\src\mat.h:752
    #4 0x7ff75604ab1c in Task::~Task(void) (C:\Users\matth\random\waifu2x-ncnn-vulkan\waifu2x-ncnn-vulkan.exe+0x14000ab1c)
    #5 0x7ff756363ae8 in save(void *) C:\Users\matth\random\waifu2x-ncnn-vulkan\src\main.cpp:450
    #6 0x7ff75636635c in ncnn::start_wrapper C:\Users\matth\random\waifu2x-ncnn-vulkan\build\ncnn\src\platform.h:95
    #7 0x7ff7562aaafb in thread_start<unsigned int (__cdecl*)(void *),1> minkernel\crts\ucrt\src\appcrt\startup\thread.cpp:97
    #8 0x7ff756222be7 in __asan::AsanThread::ThreadStart(unsigned __int64,struct __sanitizer::atomic_uintptr_t *) D:\agent\_work\10\s\src\vctools\crt\asan\llvm\compiler-rt\lib\asan\asan_thread.cc:262
    #9 0x7fff7a457033  (C:\Windows\System32\KERNEL32.DLL+0x180017033)
    #10 0x7fff7c1e2650  (C:\Windows\SYSTEM32\ntdll.dll+0x180052650)

0x1299c4db1800 is located 0 bytes inside of 19219204-byte region [0x1299c4db1800,0x1299c6005b04)
freed by thread T12 here:
    #0 0x7ff756230e12 in free D:\agent\_work\10\s\src\vctools\crt\asan\llvm\compiler-rt\lib\asan\asan_malloc_win.cc:109
    #1 0x7ff756363867 in save(void *) C:\Users\matth\random\waifu2x-ncnn-vulkan\src\main.cpp:400
    #2 0x7ff75636635c in ncnn::start_wrapper C:\Users\matth\random\waifu2x-ncnn-vulkan\build\ncnn\src\platform.h:95
    #3 0x7ff7562aaafb in thread_start<unsigned int (__cdecl*)(void *),1> minkernel\crts\ucrt\src\appcrt\startup\thread.cpp:97
    #4 0x7ff756222be7 in __asan::AsanThread::ThreadStart(unsigned __int64,struct __sanitizer::atomic_uintptr_t *) D:\agent\_work\10\s\src\vctools\crt\asan\llvm\compiler-rt\lib\asan\asan_thread.cc:262
    #5 0x7fff7a457033  (C:\Windows\System32\KERNEL32.DLL+0x180017033)
    #6 0x7fff7c1e2650  (C:\Windows\SYSTEM32\ntdll.dll+0x180052650)

previously allocated by thread T10 here:
    #0 0x7ff7562306be in _aligned_malloc D:\agent\_work\10\s\src\vctools\crt\asan\llvm\compiler-rt\lib\asan\asan_malloc_win.cc:186
    #1 0x7ff756366297 in ncnn::fastMalloc C:\Users\matth\random\waifu2x-ncnn-vulkan\src\ncnn\src\allocator.h:68
    #2 0x7ff756049eae in ncnn::Mat::create(int,int,unsigned __int64,int,class ncnn::Allocator *) C:\Users\matth\random\waifu2x-ncnn-vulkan\src\ncnn\src\mat.h:1242
    #3 0x7ff756048e28 in ncnn::Mat::Mat(int,int,unsigned __int64,int,class ncnn::Allocator *) C:\Users\matth\random\waifu2x-ncnn-vulkan\src\ncnn\src\mat.h:698
    #4 0x7ff756363fe2 in proc(void *) C:\Users\matth\random\waifu2x-ncnn-vulkan\src\main.cpp:358
    #5 0x7ff75636635c in ncnn::start_wrapper C:\Users\matth\random\waifu2x-ncnn-vulkan\build\ncnn\src\platform.h:95
    #6 0x7ff7562aaafb in thread_start<unsigned int (__cdecl*)(void *),1> minkernel\crts\ucrt\src\appcrt\startup\thread.cpp:97
    #7 0x7ff756222be7 in __asan::AsanThread::ThreadStart(unsigned __int64,struct __sanitizer::atomic_uintptr_t *) D:\agent\_work\10\s\src\vctools\crt\asan\llvm\compiler-rt\lib\asan\asan_thread.cc:262
    #8 0x7fff7a457033  (C:\Windows\System32\KERNEL32.DLL+0x180017033)
    #9 0x7fff7c1e2650  (C:\Windows\SYSTEM32\ntdll.dll+0x180052650)

(... Thread creation details snipped ...)

SUMMARY: AddressSanitizer: double-free D:\agent\_work\10\s\src\vctools\crt\asan\llvm\compiler-rt\lib\asan\asan_malloc_win.cc:199 in _aligned_free
==2096==ABORTING

There's a double-free at some point, within the saving routine.

I believe this portion may be the culprit. I do not know why freeing things here is necessary. ASan does not report leaks if I comment out this section, and removing it solves the double free. Wouldn't all of this be freed by virtue of just being in C++ and their destructors being automatically called when getting out of the scope?

Edit: small note, I used ASan by just placing the following at line 230 in src/CMakeLists.txt. That's probably not the best way to do it but hey it works

target_compile_options(waifu2x-ncnn-vulkan PRIVATE "/fsanitize=address")
nihui commented 3 years ago

fixed in https://github.com/nihui/waifu2x-ncnn-vulkan/commit/f332f3c256d3a7f9b7c18a23a05a4e402ea9fd18

ghost commented 3 years ago

Thanks for the fix. I tried testing the scale value with x8 and confirmed that it is output normally.