moskewcz / boda

Boda: A C++ Framework for Efficient Experiments in Computer Vision
Other
63 stars 12 forks source link

Segfault when running on android device #17

Closed chertio closed 7 years ago

chertio commented 7 years ago

Hi,

I am trying to learn to use boda to run cnn on android devices. I cross-compiled boda with the following enabled: base, caffe_pb, opencl and turbojpeg. After the compilation finishes and pushing boda to the device, I can run boda help which shows all the usage information, but get a segmentation fault when I try to execute ./boda rtc_test --rtc='(be=ocl)' --prog-fn='./ocl_test_dot.cl' (the .cl file is also pushed to the same directory)

What have I done wrong? Is this a test I can run with the compilation setting I have described?

Thank you very much.

moskewcz commented 7 years ago

hmm. well, i can't think why it should seg fault, if that's what you're asking. sounds like a bug i guess, but could certainty be build related.

some ideas off the top of my head:

-- what device is it: android ver, SoC/GPU, opencl lib version(s)? -- what general dev approach are you using: ver. of NDK, compiler, etc? -- can you run any other opencl programs on the device, particularly clinfo? (i forked-and-hacked a version here: https://github.com/moskewcz/clinfo/tree/ndk-build-hack but you might want to use head and your own build hack or the stock make system). -- can you use gdb/gdbserver to get a stack trace for the failure? (i think there are some notes on doing this with boda/android somewhere buried in my notes in the repo)

mwm

chertio commented 7 years ago

Thanks for the quick reply.

I am using snapdragon 820 (Open Q 820 board), android 6.0.1, API version 23, OpenCL 2.0 is supported by the adreno 530 GPU

To compile boda I have used NDK10e g++

I can run other opencl program it seems...

To get stack trace do I have to recompile with stacktrace_gnu enabled? it seems I get an error of "execinfo.h: No such file or directory " when I do...

moskewcz commented 7 years ago

ok, thanks for the info on the platform. that's the main one i've used, modulo any details of specific build/update level i guess.

I can run other opencl program it seems...

hmm, i'm not so much interested in if you can run them, but if you can compile and run them, and if so, exactly what you can run and how you compiled it. but first, getting a stack trace would be higher priority i guess.

To get stack trace do I have to recompile with stacktrace_gnu enabled?

no, as i said, just use gdb/gdbserver, as one normally would to debug any program. basically, i figure it's silly to speculate too much without knowing where the fault is: dll loading, command line parsing, etc ...

the stacktrace_ features are for internal* stack trace generation, which wouldn't help for a seg fault anyway.

also, at some point (maybe not right away if it's a bother), can you cut-n-paste the exact command or sequence of commands you ran and/or you full build log? i'm somewhat flying blind here without more info ...

for reference, here's what i get when i run the similar command from my setup_notes.txt file:

moskewcz@maaya:~/git_work/boda/run/tr3$ adb shell "cd /data/local/tmp; LD_LIBRARY_PATH=/data/local/lib /data/local/bin/boda rtc_test --rtc='(be=ocl)' --prog-fn='%(boda_test_dir)/ocl_test_dot.cl' && cat rtc_test.txt"
TIMERS:  CNT     TOT_DUR      AVG_DUR    TAG  
           1    278.995ms    278.995ms    ocl_compile
All is Well.
moskewcz commented 7 years ago

a thought: at some point, i ran into seg faults in IO code when i was trying to use crystax NDK (to get precompiled boost). see this bug:

https://tracker.crystax.net/issues/1108 (it's marked as 'closed' ... but it's not really fixed as per the last comment)

now, you say you're using NDK, not crystax, so that shouldn't be the exact issue, but i thought i'd mention it. on a related note, did you compile all dependencies as per my notes, or did you do something else? if you're using any non-ndk, non-platform libs, that could certainly be a problem? just guessing.

chertio commented 7 years ago

I built boost using the newest ndk (version 14) the stacktrace is attached

(gdb) bt

0 0x0000000000000000 in ?? ()

1 0x0000007fb7eb0830 in cxxabiv1::dynamic_cast (src_ptr=0x7fb760f000,

src_type=0x5555b174a8 <typeinfo for boda::nesi>, 
dst_type=0x5555b21920 <typeinfo for boda::has_main_t>, src2dst=-1)
at /usr/local/google/buildbot/src/android/ndk-r14-release/toolchain/gcc/gcc-4.9/libstdc++-v3/libsupc++/dyncast.cc:60

2 0x0000005555883e74 in std::dynamic_pointer_cast<boda::has_main_t, boda::nesi> (__r=...)

at /home/chertio/research/boda-rtc/dependent_libraries/ndk_tool_chain10/include/c++/4.9/bits/shared_ptr.h:455

3 boda::set_p_has_main_t_from_p_nesi (v=0x7ffffff430, dv=0x7ffffff580)

at ./gen/has_main.H.nesi_gen.cc:27

4 0x0000005555888c74 in boda::nesi_struct_make_p (nia=0x7ffffff5e0, tinfo=

0x55558a6084 <boda::lexp_name_val_map_t::insert_leaf(char const*, char const*, bool)+500>, o=0x7ffffff580) at ../src/nesi.cc:438

5 0x0000005555884aa4 in boda::p_init (nia=0x7ffffff5e0,

tinfo=<optimized out>, o=<optimized out>) at ../src/nesi.cc:140

6 0x0000005555886dc8 in boda::nesi_init_and_check_unused_from_nia (

nia=nia@entry=0x7ffffff5e0, 
ti=ti@entry=0x5555b34450 <boda::tinfo_p_has_main_t>, 
o=o@entry=0x7ffffff580) at ../src/nesi.cc:27

7 0x00000055558842fc in boda::create_and_run_has_main_t (lexp=...)

at ../src/has_main.cc:41
moskewcz commented 7 years ago

Hmm. Are you saying you used a different ndk to build boda vs. boost? I'm not sure that can work, since I think you'd end up with conflicting libstdc++ versions when linking/running boda. It's also possible it's some real bug exposed by using a newer ndk than I am using. Ideally I should try using ndk 14 myself ... But I'm a bit busy trying to graduate at just this moment!

chertio commented 7 years ago

Yeah I think Im just gonna spend a bit of time rebuild boost with the older ndk, hopefully it all works Thanks.

moskewcz commented 7 years ago

If there are issues building boda under ndk 14 (I.e. you tried and failed), I'm certainly interested in hearing about it (in another issue) if it's not too much trouble.

chengshaoyi commented 7 years ago

So I tried building everything using ndk 10e and the same error occurs, the trace is:

0 0x0000000000000000 in ?? ()

1 0x0000007fb7eb0cb8 in cxxabiv1::dynamic_cast (src_ptr=0x7fb718f000,

src_type=0x5555ad6b98 <typeinfo for boda::nesi>, dst_type=0x5555ae1010 <typeinfo for boda::has_main_t>, src2dst=-1)
at /s/ndk-toolchain/src/gcc/gcc-4.9/libstdc++-v3/libsupc++/dyncast.cc:60

2 0x000000555586f134 in std::dynamic_pointer_cast<boda::has_main_t, boda::nesi> (__r=...)

at /home/chertio/research/boda_ndk10/ndk_tool_chain10/include/c++/4.9/bits/shared_ptr.h:455

3 boda::set_p_has_main_t_from_p_nesi (v=0x7ffffff440, dv=0x7ffffff590) at ./gen/has_main.H.nesi_gen.cc:27

4 0x0000005555873f34 in boda::nesi_struct_make_p (nia=0x7ffffff5f0, tinfo=

0x5555891344 <boda::lexp_name_val_map_t::insert_leaf(char const*, char const*, bool)+500>, o=0x7ffffff590)
at ../src/nesi.cc:438

5 0x000000555586fd64 in boda::p_init (nia=0x7ffffff5f0, tinfo=, o=)

at ../src/nesi.cc:140

6 0x0000005555872088 in boda::nesi_init_and_check_unused_from_nia (nia=nia@entry=0x7ffffff5f0,

ti=ti@entry=0x5555af2450 <boda::tinfo_p_has_main_t>, o=o@entry=0x7ffffff590) at ../src/nesi.cc:27

7 0x000000555586f5bc in boda::create_and_run_has_main_t (lexp=...) at ../src/has_main.cc:41

8 0x0000005555910140 in boda::boda_main_arg_proc (os=..., argc=, argv=)

at ../src/boda.cc:95

9 0x0000005555910c78 in boda::boda_main (argv=0x7ffffff878, argc=4) at ../src/boda.cc:106

10 boda::boda_main_wrap (argc=4, argv=0x7ffffff878) at ../src/boda.cc:114

11 0x00000055556bc914 in main (argc=, argv=) at ../src/boda.cc:123

dynamic cast being problematic? you seen anything similar before?

another thing, I added "-fuse-ld=gold" for LDFLAGS when compiling boda, coz otherwise I would have whole bunch of undefined reference errors from libOpenCL.so (e.g. 'cb_get_sampler_info' undefined),

moskewcz commented 7 years ago

as i mentioned earlier, i think i need to see the full build log, at least for the boda compilation and link, to say much more; maybe with that i could replicate your issue.

it seems like a pretty simply case of dynamic_cast<> is failing, so maybe somehow RTTI is disabled or the like? perhaps a simple example would also fail, which would help narrow things down. for example, you say that the printout of the list of CLIs works. so, boda is at least somewhat running; you could hack up main to create some derived object, cast it to a base type, and try dyn-casting to back to derived and see if that works. if not, that points at a build issue. if so, it might still be a built issue, but requires multiple translation units for there to be a problem, as in the real failing case.

but of course, it's 100% possible it's some bug of mine -- at a high level there's plenty of tricky/suspect low-level code around that failing cast. it hasn't been historically buggy, but that's almost surprising. also, i think what i'm doing is legit(-enough) ... but that's always a fun question with C++ when it comes to the interactions of multiple inheritance, shared_ptr, dynamic_cast, and so on.

chertio commented 7 years ago

what I see when I do make for boda:

[editor's note: cut inlined log and moved to attached file - mwm] log.txt

Let me know if you see anything wrong. I ll try your suggestions, thanks.

moskewcz commented 7 years ago

hmm. if i use the gold linker, i also get a seg fault (dunno if it's the same one).

here, i:

i'm not sure why it would fail with the gold linker; might be my bug, might be something else ... if i get a chance, i'll gdb it to verify it's the same fault.

log attached: log.txt

moskewcz commented 7 years ago

i remembered how to use gdb. need to document the sysroot magic. anyway, yep, same fault:

Program received signal SIGSEGV, Segmentation fault. 0x0000000000000000 in ?? () (gdb) bt

0 0x0000000000000000 in ?? ()

1 0x0000007fb7eb0cb8 in cxxabiv1::dynamic_cast (src_ptr=0x7fb760f000, src_type=0x5555b2aa78 ,

dst_type=0x5555b34ef0 <typeinfo for boda::has_main_t>, src2dst=-1) at /s/ndk-toolchain/src/gcc/gcc-4.9/libstdc++-v3/libsupc++/dyncast.cc:60

2 0x0000005555888944 in boda::set_p_has_main_t_from_p_nesi(void, void) ()

3 0x000000555588d744 in boda::nesi_struct_make_p(boda::lexp_name_val_map_t, boda::tinfo_t const, void*) ()

4 0x0000005555889574 in boda::p_init(boda::lexp_name_val_map_t, boda::tinfo_t const, void*) ()

5 0x000000555588b898 in boda::nesi_init_and_check_unused_from_nia(boda::lexp_name_val_map_t, boda::tinfo_t const, void*) ()

6 0x0000005555888dcc in boda::create_and_run_has_main_t(std::shared_ptr) ()

7 0x0000005555929950 in boda::boda_main_arg_proc(std::ostream&, int, char**) ()

8 0x000000555592a488 in boda::boda_main_wrap(int, char**) ()

9 0x00000055556d6154 in main ()

(gdb)

chertio commented 7 years ago

Since we've got pretty much the same platform, do you mind if I just grab the working binary from you?

moskewcz commented 7 years ago

i guess not, although i dunno how useful that will be beyond checking that my binary does work on your target. if you build/link without gold, that doesn't fix things?

boda.gz

chertio commented 7 years ago

Thank you! I ran into some link issues when linking without gold, but after spending some time fixing those it seems to actually work fine now....