jll63 / yomm2

Fast, orthogonal, open multi-methods. Solve the Expression Problem in C++17.
Boost Software License 1.0
343 stars 18 forks source link

error in runtime destructor #6

Closed shawncao closed 4 years ago

shawncao commented 4 years ago

HI @jll63,

Thanks for the lib again, still playing with it. Seeing this error in runtime destructor, do you know what may cause this? (not repro debug build on mac, haven't done debugging it on linux server yet assuming this is something wrong related to runtime struct for certain case)

Error in `./NodeServer': free(): corrupted unsorted chunks: 0x000000000155b170 Aborted at 1563204608 (unix time) try "date -d @1563204608" if you are using GNU date PC: @ 0x0 (unknown) SIGABRT (@0x3bb20000227b) received by PID 8827 (TID 0x7f8a872cf080) from PID 8827; stack trace: @ 0x7f8a86cb6330 (unknown) @ 0x7f8a860dfc37 gsignal @ 0x7f8a860e3028 abort @ 0x7f8a8611c2a4 (unknown) @ 0x7f8a8612882e (unknown) @ 0x698710 __gnu_cxx::new_allocator<>::deallocate() @ 0x695988 std::allocator_traits<>::deallocate() @ 0x690eee std::_Vector_base<>::_M_deallocate() @ 0x68b38f std::_Vector_base<>::~_Vector_base() @ 0x687785 std::vector<>::~vector() @ 0x687120 yorel::yomm2::detail::runtime::~runtime() @ 0x681898 yorel::yomm2::detail::update_methods() @ 0x6862f5 yorel::yomm2::update_methods() @ 0x41ebdd RunServer() @ 0x41abda main

call stack in GDB: (gdb) bt

0 0x00007ffff6df1c37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56

1 0x00007ffff6df5028 in __GI_abort () at abort.c:89

2 0x00007ffff6e2e2a4 in __libc_message (do_abort=do_abort@entry=1, fmt=fmt@entry=0x7ffff6f40350 " Error in `%s': %s: 0x%s \n")

at ../sysdeps/posix/libc_fatal.c:175

3 0x00007ffff6e3a82e in malloc_printerr (ptr=, str=0x7ffff6f404a0 "free(): corrupted unsorted chunks", action=1) at malloc.c:4998

4 _int_free (av=, p=, have_lock=0) at malloc.c:3842

5 0x00000000006b269a in __gnu_cxx::new_allocator::deallocate(yorel::yomm2::detail::rt_method*, unsigned long) ()

6 0x00000000006b011c in std::allocator_traits<std::allocator >::deallocate(std::allocator&, yorel::yomm2::detail::rt_method*, unsigned long) ()

7 0x00000000006abcec in std::_Vector_base<yorel::yomm2::detail::rt_method, std::allocator >::_M_deallocate(yorel::yomm2::detail::rt_method*, unsigned long) ()

8 0x00000000006a6703 in std::_Vector_base<yorel::yomm2::detail::rt_method, std::allocator >::~_Vector_base() ()

9 0x00000000006a2f8b in std::vector<yorel::yomm2::detail::rt_method, std::allocator >::~vector() ()

10 0x00000000006a29c6 in yorel::yomm2::detail::runtime::~runtime() ()

11 0x000000000069d306 in yorel::yomm2::detail::update_methods(yorel::yomm2::detail::registry const&, yorel::yomm2::detail::dispatch_data&) ()

12 0x00000000006a1d63 in yorel::yomm2::update_methods() ()

13 0x0000000000413622 in main (argc=1, argv=0x7fffffffe808) at /home/shawncao/nebula/src/service/node/NodeServer.cpp:178

jll63 commented 4 years ago

Thanks.

Can you run your program with env variable YOMM2_ENABLE_TRACE=1 and post the output please?

Also it would help if I had a skeleton of all the classes and methods. I don't need all the code, just the class declarations (class body can be left empty) and the corresponding calls to register_class, declare_method and define_method (again method body is not needed).

shawncao commented 4 years ago

Thanks @jll63

This is the header file defines all register_class, define_methods, https://github.com/shawncao/nebula/blob/master/src/execution/serde/RowCursorSerde.h

This is the place update_methods get called https://github.com/shawncao/nebula/blob/master/src/service/node/NodeServer.cpp#L159 basically the main entry.

Just update on what I have found: If I move open_methods call into NodeServerImpl constructor, then it works fine without crash, is it namespace issue (since register_class/define_method called inside namespace nebula::execution::serde)?

NodeServerImpl() { // We're using AOP lib yomm2 to inject batch serialiation // Since we don't use dynamic library loading, we call this once at starting point. // TODO(cao) - crashes node server, need to figure out root cause before executing query yorel::yomm2::update_methods(); }

shawncao commented 4 years ago

Yeah, seems like if I make this call inside our namespace rather than in main(), it will work fine, you may already know why...

Small updates in above second link .... void updateOpenMethods() { yorel::yomm2::update_methods(); }

} // namespace service } // namespace nebula

void RunServer() { // update_methods needs to be called inside our namespace, otherwise it will crash. nebula::service::updateOpenMethods(); ...

jll63 commented 4 years ago

Well...this is very weird. The namespace should not matter the least, in fact update_methods is supposed to be called from main.

Can you try to put update_methods back where it was, but this time call it like this: ::yorel::yomm2::update_methods(); (note the :: at the beginning).

Does your code have a lot of dependencies? I may try to build it tonight.

Also, please do try YOMM2_ENABLE_TRACE=1; and it's also fun to look at ;-)

shawncao commented 4 years ago

Sorry this is the trace when it fails, I'm not sure if above "namespace moving" is really fixing the issue:

Register nebula::surface::RowCursor with &typeid 0xa00fd0 Register nebula::execution::core::BlockExecutor with &typeid 0xa01008 Register nebula::execution::core::SamplesExecutor with &typeid 0xa01050 Register nebula::memory::keyed::FlatRowCursor with &typeid 0xa010a8 Register nebula::surface::CompositeRowCursor with &typeid 0xa01138 Register nebula::surface::MockRowCursor with &typeid 0xa010e8 Register method asBuffer(yorel::yomm2::virtual<nebula::surface::RowCursor&>, nebula::type::Schema) asBuffer(yorel::yomm2::virtual<nebula::surface::RowCursor&>, nebula::type::Schema): add spec (nebula::surface::RowCursor & cursor, nebula::type::Schema schema) asBuffer(yorel::yomm2::virtual<nebula::surface::RowCursor&>, nebula::type::Schema): add spec (nebula::execution::core::BlockExecutor & b, nebula::type::Schema) asBuffer(yorel::yomm2::virtual<nebula::surface::RowCursor&>, nebula::type::Schema): add spec (nebula::memory::keyed::FlatRowCursor & f, nebula::type::Schema) Layering... nebula::surface::RowCursor nebula::execution::core::BlockExecutor nebula::execution::core::SamplesExecutor nebula::memory::keyed::FlatRowCursor nebula::surface::CompositeRowCursor nebula::surface::MockRowCursor Allocating slots... nebula::surface::RowCursor... for asBuffer(yorel::yomm2::virtual<nebula::surface::RowCursor&>, nebula::type::Schema)#0: 0 also in nebula::execution::core::BlockExecutor nebula::execution::core::SamplesExecutor nebula::memory::keyed::FlatRowCursor nebula::surface::CompositeRowCursor nebula::surface::MockRowCursor Building dispatch table for asBuffer(yorel::yomm2::virtual<nebula::surface::RowCursor&>, nebula::type::Schema) make groups for param #0, class nebula::surface::RowCursor specs applicable to nebula::surface::MockRowCursor (nebula::surface::RowCursor & cursor, nebula::type::Schema schema) bit mask = 001 specs applicable to nebula::surface::CompositeRowCursor (nebula::surface::RowCursor & cursor, nebula::type::Schema schema) bit mask = 001 specs applicable to nebula::memory::keyed::FlatRowCursor (nebula::surface::RowCursor & cursor, nebula::type::Schema schema) (nebula::memory::keyed::FlatRowCursor & f, nebula::type::Schema) bit mask = 101 specs applicable to nebula::execution::core::SamplesExecutor (nebula::surface::RowCursor & cursor, nebula::type::Schema schema) bit mask = 001 specs applicable to nebula::surface::RowCursor (nebula::surface::RowCursor & cursor, nebula::type::Schema schema) bit mask = 001 specs applicable to nebula::execution::core::BlockExecutor (nebula::surface::RowCursor & cursor, nebula::type::Schema schema) (nebula::execution::core::BlockExecutor & b, nebula::type::Schema) bit mask = 011 groups for dim 0: group 0/0 mask 001 nebula::surface::MockRowCursor nebula::surface::CompositeRowCursor nebula::execution::core::SamplesExecutor nebula::surface::RowCursor group 0/1 mask 011 nebula::execution::core::BlockExecutor group 0/2 mask 101 nebula::memory::keyed::FlatRowCursor assign specs group 0/0 mask 001 nebula::surface::MockRowCursor nebula::surface::CompositeRowCursor nebula::execution::core::SamplesExecutor nebula::surface::RowCursor select best of: (nebula::surface::RowCursor & cursor, nebula::type::Schema schema) (nebula::surface::RowCursor & cursor, nebula::type::Schema schema): pf = 0x41cb70 group 0/1 mask 011 nebula::execution::core::BlockExecutor select best of: (nebula::surface::RowCursor & cursor, nebula::type::Schema schema) (nebula::execution::core::BlockExecutor & b, nebula::type::Schema) (nebula::execution::core::BlockExecutor & b, nebula::type::Schema): pf = 0x41cb40 group 0/2 mask 101 nebula::memory::keyed::FlatRowCursor select best of: (nebula::surface::RowCursor & cursor, nebula::type::Schema schema) (nebula::memory::keyed::FlatRowCursor & f, nebula::type::Schema) (nebula::memory::keyed::FlatRowCursor & f, nebula::type::Schema): pf = 0x41ec50 assign next (nebula::surface::RowCursor & cursor, nebula::type::Schema schema): select best of: -> none (nebula::execution::core::BlockExecutor & b, nebula::type::Schema): select best of: (nebula::surface::RowCursor & cursor, nebula::type::Schema schema) -> (nebula::surface::RowCursor & cursor, nebula::type::Schema schema) (nebula::memory::keyed::FlatRowCursor & f, nebula::type::Schema): select best of: (nebula::surface::RowCursor & cursor, nebula::type::Schema schema) -> (nebula::surface::RowCursor & cursor, nebula::type::Schema schema) Finding hash factor for 6 ti* trying with M = 3, 8 buckets found 1523255767835814935 after 5 attempts and 0.02152 msecs Initializing global vector at 0xfd97f0 0 pointer to control table 1 hash table 9 control table 17 asBuffer(yorel::yomm2::virtual<nebula::surface::RowCursor&>, nebula::type::Schema) 17 mtbl for nebula::surface::RowCursor: 0xfd9878 18 mtbl for nebula::execution::core::BlockExecutor: 0xfd9880 19 mtbl for nebula::execution::core::SamplesExecutor: 0xfd9888 20 mtbl for nebula::memory::keyed::FlatRowCursor: 0xfd9890 21 mtbl for nebula::surface::CompositeRowCursor: 0xfd9898 22 mtbl for nebula::surface::MockRowCursor: 0xfd98a0 23 end Optimizing asBuffer(yorel::yomm2::virtual<nebula::surface::RowCursor&>, nebula::type::Schema) nebula::surface::MockRowCursor.mtbl[0] = 0x41cb70 (function) nebula::surface::CompositeRowCursor.mtbl[0] = 0x41cb70 (function) nebula::memory::keyed::FlatRowCursor.mtbl[0] = 0x41ec50 (function) nebula::execution::core::SamplesExecutor.mtbl[0] = 0x41cb70 (function) nebula::surface::RowCursor.mtbl[0] = 0x41cb70 (function) nebula::execution::core::BlockExecutor.mtbl[0] = 0x41cb40 (function) Finished Aborted at 1563212180 (unix time) try "date -d @1563212180" if you are using GNU date PC: @ 0x0 (unknown) SIGABRT (@0x3bb200003e5f) received by PID 15967 (TID 0x7f1f77645080) from PID 15967; stack trace: @ 0x7f1f7702c330 (unknown) @ 0x7f1f76455c37 gsignal @ 0x7f1f76459028 abort @ 0x7f1f764922a4 (unknown) @ 0x7f1f7649e82e (unknown) @ 0x699cae (unknown) @ 0x696f26 (unknown) @ 0x69248c (unknown) @ 0x68c92d (unknown) @ 0x688d23 (unknown) @ 0x6886be (unknown) @ 0x682e78 (unknown) @ 0x6878d5 (unknown) @ 0x41f2fd (unknown) @ 0x41a94a (unknown) @ 0x7f1f76440f45 __libc_start_main @ 0x41c0b8 (unknown) @ 0x0 (unknown)

jll63 commented 4 years ago

I have to install some deps to navigate your code but I could not find RowCursor...

jll63 commented 4 years ago
$ cmake ..
-- Boost version: 1.65.1
-- Found the following Boost libraries:
--   program_options
--   regex
--   system
--   filesystem
--   context
--   thread
--   chrono
--   date_time
--   atomic
-- NEBULA_ROOT : /home/jleroy/dev/nebula
-- NEBULA_SRC : /home/jleroy/dev/nebula/src
Nebula Server: /home/jleroy/dev/nebula/build/NebulaServer
CMake Error at src/service/Service.cmake:76 (file):
  file problem touching file:
  /home/jleroy/dev/nebula/src/service/gen/nebula/node/node.grpc.fb.cc
Call Stack (most recent call first):
  CMakeLists.txt:149 (include)

-- Configuring incomplete, errors occurred!

Also:

$ ls -l /home/jleroy/dev/nebula/src/service/gen/nebula/node
ls: cannot access '/home/jleroy/dev/nebula/src/service/gen/nebula/node': No such file or directory

Some parts missing?

jll63 commented 4 years ago

OK I created a skeleton copy of your program, which gives me the exact same yomm2 trace. It doesn't crash. Everything looks as it should be. At this point I tend to think that it is not a yomm2 problem. I suggest that you try to return just after calling update_methods to see what happens. And maybe trace in the debugger.

I have the impression no threads (except main) exist yet when you call update_methods, correct?

shawncao commented 4 years ago

OK I created a skeleton copy of your program, which gave me the exact same yomm2 trace. It doesn't crash. Everything looks as it should be. At this point I tend to thing that it is not a yomm2 problem. I suggest that you try to return just after calling update_methods. And maybe trace in the debugger.

I have the impression no threads (except main) exist yet when you call update_methods, correct?

Yes - no threads. It is failed in update_methods, I added trace, and the first post has the call stack when it fails. Maybe it relates to compiler code optimization? The same code, built twice, passed some and failed some times. So my above comments on how to make it work are not true. I'll debug it and post back my findings.

jll63 commented 4 years ago

Any progress on this?

shawncao commented 4 years ago

Any progress on this?

No, I didn't figure out why but I updated in the first post with GDB crash stack. right now, I'm using dynamic cast instead to unblock, would love to debug more if I get time.

jll63 commented 4 years ago

Got it. There was a bug in the last commit. Fixed now. Can you pull and check on your side?

shawncao commented 4 years ago

Got it. There was a bug in the last commit. Fixed now. Can you pull and check on your side?

Hmm, interesting, looks like it's working now with latest pull. I'll keep it run for a few days and report back if this really fixes it.

jll63 commented 4 years ago

Hi, any more problems?

shawncao commented 4 years ago

Latest fix seems like a root cause fix, I didn't see other issues in past week after pull latest fix. Thanks for the fix, @jll63 !