servalproject / serval-dna

The Serval Project's core daemon that implements Distributed Numbering Architecture (DNA), MDP, VoMP, Rhizome, MeshMS, etc.
http://servalproject.org
Other
170 stars 81 forks source link

MeshMS fatal crash #108

Closed adur1990 closed 6 years ago

adur1990 commented 8 years ago

rhizome_add_manifest_to_store() SIGABRT with commit 7a9b6d5d722713bdaf8f2a67cea63222d728d87e of asserts branch.

Traceback:

node3.log.zip

FATAL:[ 1338] 11:18:54.696 rhizome_fetch.c:486:rhizome_import_received_bundle()  rhizome_add_manifest_to_store() returned 9
FATAL:[ 1338] 11:18:54.696 main.c:60:crash_handler()  Caught signal SIGABRT (6) Aborted
FATAL:[ 1338] 11:18:54.696 main.c:61:crash_handler()  The following clue may help: no clue
FATAL:[ 1338] 11:18:54.696 performance_timing.c:227:dump_stack()  rhizome_write_complete
FATAL:[ 1338] 11:18:54.697 performance_timing.c:227:dump_stack()  rhizome_fetch_poll
FATAL:[ 1338] 11:18:54.697 performance_timing.c:227:dump_stack()  call_alarm
FATAL:[ 1338] 11:18:54.697 performance_timing.c:227:dump_stack()  fd_poll2
FATAL:[ 1338] 11:18:54.697 performance_timing.c:227:dump_stack()  server
FATAL:[ 1338] 11:18:54.697 performance_timing.c:227:dump_stack()  app_server_start
FATAL:[ 1338] 11:18:54.697 performance_timing.c:227:dump_stack()  cli_invoke
FATAL:[ 1338] 11:18:54.697 performance_timing.c:227:dump_stack()  parseCommandLine
FATAL:[ 1338] 11:18:54.697 main.c:63:crash_handler()  GDB BACKTRACE
FATAL:[ 1338] 11:18:54.815 GDB 
FATAL:[ 1338] 11:18:54.815 GDB warning: File "/lib/libthread_db.so.1" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
FATAL:[ 1338] 11:18:54.815 GDB 
FATAL:[ 1338] 11:18:54.815 GDB warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
FATAL:[ 1338] 11:18:54.815 GDB 
FATAL:[ 1338] 11:18:54.815 GDB warning: Unable to find dynamic linker breakpoint function.
FATAL:[ 1338] 11:18:54.815 GDB GDB will be unable to debug shared library initializers
FATAL:[ 1338] 11:18:54.815 GDB and track explicitly loaded dynamic code.
FATAL:[ 1338] 11:18:54.815 GDB 
FATAL:[ 1338] 11:18:54.815 GDB warning: File "/lib/libthread_db.so.1" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
FATAL:[ 1338] 11:18:54.815 GDB 
FATAL:[ 1338] 11:18:54.815 GDB warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
FATAL:[ 1338] 11:18:54.816 GDB 0xb7726d0e in ?? ()
FATAL:[ 1338] 11:18:54.816 GDB #0  0xb7726d0e in ?? ()
FATAL:[ 1338] 11:18:54.816 GDB #1  0x081262e7 in crash_handler (signal=6) at main.c:63
FATAL:[ 1338] 11:18:54.816 GDB #2  <signal handler called>
FATAL:[ 1338] 11:18:54.816 GDB #3  0xb77577e4 in ?? ()
FATAL:[ 1338] 11:18:54.816 GDB #4  0x081759ca in rhizome_write_complete (slot=slot@entry=0x8213080 <rhizome_fetch_queues+9312>) at rhizome_fetch.c:1251
FATAL:[ 1338] 11:18:54.816 GDB #5  0x08176528 in rhizome_write_content (slot=0x8213080 <rhizome_fetch_queues+9312>, buffer=0x82133f1 <rhizome_fetch_queues+10193> "M\222\244\324\202ij4y4?\240\223\221\243b'\vq\203\373\263\360<\022\253\256[\034x\214^\377\310\250\204TZ\372G\357/\231U\344>W\212\350\035.\224\n\332U\335u_\250\071\365\367\267F\316\365N\361?\360\230\006\363\273\355|\371\204\023_pgH\215\250\021\a\022&\236s\262\210\021\023\224\301T\353\257\bf\275!\223\236\365\221\276>", bytes=<optimized out>) at rhizome_fetch.c:1338
FATAL:[ 1338] 11:18:54.816 GDB #6  0x08177792 in rhizome_fetch_poll (alarm=0x8213080 <rhizome_fetch_queues+9312>) at rhizome_fetch.c:1498
FATAL:[ 1338] 11:18:54.816 GDB #7  0x080e9b24 in call_alarm (alarm=0x8213080 <rhizome_fetch_queues+9312>, revents=1) at fdqueue.c:314
FATAL:[ 1338] 11:18:54.816 GDB #8  0x080eb416 in fd_poll2 (waiting=0x0, wokeup=0x0) at fdqueue.c:433
FATAL:[ 1338] 11:18:54.816 GDB #9  0x08188b3b in server_loop () at server.c:361
FATAL:[ 1338] 11:18:54.816 GDB #10 server () at server.c:380
FATAL:[ 1338] 11:18:54.816 GDB #11 0x08189771 in app_server_start (parsed=0xbf8df314, context=0x0) at server.c:800
FATAL:[ 1338] 11:18:54.816 GDB #12 0x080bd747 in cli_invoke (parsed=0xbf8df314, context=0x0) at cli.c:330
FATAL:[ 1338] 11:18:54.816 GDB #13 0x08102bf2 in parseCommandLine (context=0x0, argv0=0xbf8dfee2 "servald", argc=1, args=0xbf8df578) at commandline.c:245
FATAL:[ 1338] 11:18:54.816 GDB #14 0x0804d979 in main (argc=2, argv=0xbf8df574) at main.c:49
FATAL:[ 1338] 11:18:54.816 gdb exited normally with status 0
lakeman commented 8 years ago

https://github.com/servalproject/serval-dna/blob/7a9b6d5d722713bdaf8f2a67cea63222d728d87e/rhizome_fetch.c#L486

Not dealing with "database busy". I think the right fix here is to pass the rhizome_bundle_status back to the caller and make them deal with it.

However, using the meshms restful API will avoid this problem completely.

gh0st42 commented 8 years ago

We were sending using the restful api... only receiving nodes were periodically checking list conversations for status. Also the crashing node was only a relaying one.

We will try to port our periodic check scripts to restful even though no manual command should have such fatal effects imo :)

quixotique commented 6 years ago

Recent improvements in the documentation and testing of RESTful APIs make it easier for scripts to use utilities like curl(1) instead of invoking the servald command line, which as @lakeman points out will not suffer from database lock errors.

The ideal option would be to provide a servalc (Serval Client) command-line utility that acted as a REST HTTP client of the daemon, so scripts could use it and get their results back formatted just like the existing CLI. However that is beyond the scope of this issue.

Closing.