scanner-research / scanner

Efficient video analysis at scale
https://scanner-research.github.io/
Apache License 2.0
615 stars 108 forks source link

GPU example fails. #235

Open sth1997 opened 5 years ago

sth1997 commented 5 years ago

My cuda version is 9.0 and my cudnn version is 3.7.5. I cau successfully the Walkthrough.ipynb code with cpu on a single node or multiple nodes. But if I set device=DeviceType.GPU for db.ops.Histogram and ran it on a single node or multiple nodes , it failed. This is its output: 5%|██████████▊ | 1/19 [00:02<00:36, 2.01s/it, jobs=1, tasks=18, workers=1] Segmentation fault I checked the log, this is no WARNING logs, just one INFO log:

Log file created at: 2018/12/06 16:30:00 Running on machine: gorgon4 Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg I1206 16:30:00.866015 107280 ingest.cpp:936] Writing database metadata I1206 16:30:00.869004 107280 ingest.cpp:940] Writing table megafile I1206 16:30:00.889670 107194 worker.cpp:480] Creating worker I1206 16:30:00.889878 107194 worker.cpp:497] Create master stub I1206 16:30:00.889976 107194 worker.cpp:500] Finish master stub I1206 16:30:00.890017 107194 worker.cpp:507] Worker created. I1206 16:30:00.890188 107194 worker.cpp:666] Worker try to register with master I1206 16:30:00.891312 107194 worker.cpp:693] Worker registered with master with id 0 I1206 16:30:00.902165 107327 worker.cpp:548] Worker 0 received NewJob I1206 16:30:00.902737 107326 worker.cpp:722] Worker 0 loading Op library: /home/sth/.local/lib/python3.6/site-packages/scannerpy/lib/libscanner_stdlib.so I1206 16:30:00.905745 107326 worker.cpp:1254] Initial pipeline instances per node: -1 I1206 16:30:00.905762 107326 worker.cpp:1280] Kernel Group 0 Pipeline instances per node: 1 I1206 16:30:00.905768 107326 worker.cpp:1294] Pipeline instances per node: 1

After that, I have also tried use GPU in examples/apps/quickstart/main.py. I set device=DeviceType.GPU for db.ops.Resize, it also failed. This is its output: 0%| | 0/7 [00:02<?, ?it/s, jobs=1, tasks=7, workers=1] Segmentation fault I also checked the log, the INFO log is a little different from the previous one:

Log file created at: 2018/12/06 16:35:14 Running on machine: gorgon4 Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg I1206 16:35:14.334425 107522 ingest.cpp:936] Writing database metadata

Have you ever met this problem? Let me know if you need more information. @apoms @willcrichton

willcrichton commented 5 years ago

Can you run it in gdb and give us the stack trace?

sth1997 commented 5 years ago

Sorry, I only know how to run pure c/c++ project in gdb (and I know how to set CMAKE_BUILD_TYPE when compiling), but I never use gdb for debuging c/c++ functions called by python. Could you please tell me how to run scanner (like main.py) in gdb? Thanks!

willcrichton commented 5 years ago

@sth1997 sorry for late reply. You can run gdb on python like this:

$ gdb python3
(gdb) r main.py
...
sth1997 commented 5 years ago
Thread 149 "python3" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffecdb4f700 (LWP 20718)]
0x00007fff8d2dd666 in ?? () from /usr/lib/nvidia-387/libnvcuvid.so.1
(gdb) backtrace full
#0  0x00007fff8d2dd666 in ?? () from /usr/lib/nvidia-387/libnvcuvid.so.1
No symbol table info available.
#1  0x00007fff8d2deac4 in ?? () from /usr/lib/nvidia-387/libnvcuvid.so.1
No symbol table info available.
#2  0x00007fff8d2772f1 in ?? () from /usr/lib/nvidia-387/libnvcuvid.so.1
No symbol table info available.
#3  0x00007fff8ed09a97 in scanner::internal::NVIDIAVideoDecoder::feed (this=0x7ffe98001c30, 
    encoded_buffer=0x7ffed476443a <error: Cannot access memory at address 0x7ffed476443a>, encoded_size=3926, 
    discontinuity=<optimized out>) at /home/sth/video/scanner/scanner/scanner/video/nvidia/nvidia_video_decoder.cpp:225
        cupkt = {flags = 0, payload_size = 3926, payload = 0x7ffed476443a <error: Cannot access memory at address 0x7ffed476443a>, 
          timestamp = 0}
        dummy = 0x555555e3ef10
#4  0x00007fff8ed06988 in scanner::internal::DecoderAutomata::feeder (this=<optimized out>)
    at /home/sth/video/scanner/scanner/scanner/video/decoder_automata.cpp:310
        encoded_buffer_size = <optimized out>
        encoded_packet_size = 3926
        encoded_buffer = <optimized out>
        encoded_packet = <optimized out>
        seen_metadata = <optimized out>
        frames_fed = <optimized out>
#5  0x00007fffea104c5c in std::execute_native_thread_routine_compat (__p=<optimized out>)
    at /opt/conda/conda-bld/compilers_linux-64_1520532893746/work/.build/src/gcc-7.2.0/libstdc++-v3/src/c++11/thread.cc:110
        __t = <optimized out>
        __local = {<std::__shared_ptr<std::thread::_Impl_base, (__gnu_cxx::_Lock_policy)2>> = {<std::__shared_ptr_access<std::thread::_Impl_base, (__gnu_cxx::_Lock_policy)2, false, false>> = {<No data fields>}, _M_ptr = 0x7ffe98010bd0, _M_refcount = {
              _M_pi = 0x7ffe98010bc0}}, <No data fields>}
#6  0x00007ffff7bc16ba in start_thread (arg=0x7ffecdb4f700) at pthread_create.c:333
        __res = <optimized out>
        pd = 0x7ffecdb4f700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140732349609728, 4193268998478770737, 0, 140732505506479, 140732349610432, 0, 
                -4193730285998951887, -4193250823535971791}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {
              prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
        __PRETTY_FUNCTION__ = "start_thread"
#7  0x00007ffff78f741d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.