obgm / libcoap

A CoAP (RFC 7252) implementation in C
Other
800 stars 424 forks source link

Assertion !lock->in_callback failed in coap_lock_lock_func() during application shutdown #1403

Closed anyc closed 5 months ago

anyc commented 5 months ago

Hello,

with the current develop branch, my server gets aborted sometimes due to a failed assert during application shutdown:

../git/src/coap_threadsafe.c:716: coap_lock_lock_func: Assertion `!lock->in_callback' failed.

Program received signal SIGABRT, Aborted.
__pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
44      pthread_kill.c: No such file or directory.
(gdb) back
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1  0x76c7a390 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78
#2  0x76c36e70 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x76c215f8 in __GI_abort () at abort.c:79
#4  0x76c30308 in __assert_fail_base (fmt=0x76d4e220 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x76f7fb24 "!lock->in_callback", 
    assertion@entry=0x76ff2020 "", file=0x76f7fa78 "../git/src/coap_threadsafe.c", file@entry=0x76d4e220 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
    line=line@entry=716, function=0x76f7fb88 <__PRETTY_FUNCTION__.0> "coap_lock_lock_func", function@entry=0x76f7fa78 "../git/src/coap_threadsafe.c")
    at assert.c:92
#5  0x76c303ac in __GI___assert_fail (assertion=0x76ff2020 "", file=0x76d4e220 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", line=line@entry=716, 
    function=0x76f7fa78 "../git/src/coap_threadsafe.c") at assert.c:101
#6  0x76f6ceac in coap_lock_lock_func (lock=0x76ff2020, file=file@entry=0x76f7fa78 "../git/src/coap_threadsafe.c", line=1995963000, line@entry=452)
    at ../git/src/coap_threadsafe.c:716
#7  0x76f6e50c in coap_resource_notify_observers (r=0x46e3d0, query=query@entry=0x0) at ../git/src/coap_threadsafe.c:452
#8  0x00405b3c in close_session (no_session=0x4a9340) at main.c:996
#9  0x004060b4 in coap_event_handler (session=0x4a9110, event=COAP_EVENT_SERVER_SESSION_DEL) at main.c:1171
#10 0x76f4ee38 in coap_handle_event (context=0x442780, event=event@entry=COAP_EVENT_SERVER_SESSION_DEL, session=session@entry=0x4a9110)
    at ../git/src/coap_net.c:4147
#11 0x76f65d10 in coap_free_endpoint_locked (ep=0x4668f0) at ../git/src/coap_session.c:1861
#12 coap_free_endpoint_locked (ep=0x4668f0) at ../git/src/coap_session.c:1852
#13 0x76f4ae10 in coap_free_context_locked (context=0x442780) at ../git/src/coap_net.c:682
#14 0x76f4b018 in coap_free_context (context=<optimized out>) at ../git/src/coap_net.c:634
#15 0x00403d98 in main (argc=<optimized out>, argv=<optimized out>) at main.c:1771

If a session closes, a resource (that can be observed) changes its value and my SERVER_SESSION_DEL callback will call coap_resource_notify_observers(). If this happens during application shutdown, the assert fails. Is this something my application should avoid during shutdown or is this something libcoap itself should be able to handle?

Thank you!

mrdeep1 commented 5 months ago

Interesting issue. Glad that it has been raised. coap_free_context() has been called which means that the coap_context_t is now in an unstable state when any app call-backs are invoked.

One of the first things coap_free_context() does is to free off all the resources - and when an 'observed' resource is freed off, the final observe notification is sent out. However, you will still get triggered by an event and so your close_session() will continue to get called.

In your case, the call to coap_resource_notify_observers() should silently fail as the coap_context_tis unstable.

See #1404 for a fix. Please confirm that this works as expected.

anyc commented 5 months ago

It shuts down without an error now, yes. However, valgrind tells me:

==4947== Invalid read of size 1
==4947==    at 0x48DFAFC: coap_resource_notify_observers (coap_resource.c:1195)
==4947==    by 0x10DB3B: close_session (main.c:996)
==4947==    by 0x10E0B3: coap_event_handler (main.c:1171)
==4947==    by 0x48E2453: coap_free_endpoint (coap_session.c:1861)
==4947==    by 0x48CEE37: coap_free_context_locked (coap_net.c:682)
==4947==    by 0x48CEE37: coap_free_context_locked (coap_net.c:639)
==4947==  Address 0x51db078 is 0 bytes inside a block of size 100 free'd
==4947==    at 0x4867EAC: free (vg_replace_malloc.c:872)
==4947==    by 0x48E0623: coap_delete_all_resources (coap_resource.c:591)
==4947==    by 0x48CEDCF: coap_free_context_locked (coap_net.c:646)
==4947==    by 0x48CEDCF: coap_free_context_locked (coap_net.c:639)
==4947==  Block was alloc'd at
==4947==    at 0x4864B70: malloc (vg_replace_malloc.c:381)
==4947==    by 0x48DDC37: coap_resource_init (coap_resource.c:269)
mrdeep1 commented 5 months ago

Ok, gets interesting. coap_delete_all_resources() has been called, so any resource is no more. You are then calling coap_resource_notify_observers() with a resource parameter pointing to a piece of de-allocated memory.

I am thinking of going in the direction of no call-back handlers are called after coap_free_context() has been called. This does however mean that things like close_session() will no longer be called at this point - and so some custom clean-up will not occur.

It is possible to close down server sessions before calling coap_free_context() which will then triggers the appropriate call-back handles. It is a bit messy but this does the trick. Normally server sessions are left in a pool so they can be re-used and can't just be freed off.

    coap_session_set_type_client(session);
    coap_session_release(session);
anyc commented 5 months ago

Hm, could we just call coap_delete_all_resources() after coap_free_endpoint()?

mrdeep1 commented 5 months ago

Possibly, not sure what the law of unintended consequences will also throw up!

Bottom line is any 'thing' that was associated with coap_context_tafter that 'thing' gets released is problematic.

I am going in the direction that after coap-free_context() is called, no call-backs are invoked (which means just clearing out all the handler pointers early on in coap_free_context()) as they inherently can be using unsafe information. I will have a play with this.

mrdeep1 commented 5 months ago

Whenever coap_event_handler() is called with COAP_EVENT_SERVER_SESSION_DEL which invokes close_session() in your case, the session is not associated with any observing resources. So, coap_resource_notify_observers()makes no sense at this point and is not needed.

As a note, whenever an observe request is added to a session, the session reference count is bumped. When the observe is removed from the session, the session reference is decremented. COAP_EVENT_SERVER_SESSION_DEL can only occur when the reference count is 0. coap_free_context() frees off all the resources (if resource is being observed, a final notify is sent out) which decrements any associated sessions.

So, coap_free_context() should not disable any call-backs, enabling the call-backs to continue to do any custom cleanups.

mrdeep1 commented 5 months ago

@anyc With the removal of the coap_resource_notify_observers() call when handling the event COAP_EVENT_SERVER_SESSION_DEL (as it will never do anything for the current session as the reference count is 0 - hence not observing any resource), are there any other application shutdown issues associated with COAP_EVENT_SERVER_SESSION_DEL you are having with the latest develop branch, or can this Issue now be closed?

anyc commented 5 months ago

Sorry, I am on vacation this week. I am not sure if I understand your reasoning but in my case, notify_observers() does not only notify the peer of the to-be-closed session but possibly also other peers so I cannot remove it from my event handler. To explain a little bit, a peer can acquire a mutex that protects multiple resources. The resources only accept new data if the mutex is locked by the same session and the mutex state is accessible over its own resource. If the session is closed, the lock is automatically released which shall be signaled to other sessions.

mrdeep1 commented 5 months ago

Then I think the best thing to do is make all your app tracking resource pointers (the ones you are calling coap_resource_notify_observers() with) to NULL just before you call coap_free_context().

Then, on (the coap_free_context()) COAP_EVENT_SERVER_SESSION_DEL, there should not be any issues. However, coap_resource_notify_observers() currently does not like a NULL resource pointer.

EDIT: coap_free_context() frees off all the resources - if a resource is being observed, all subscribers will get notified.

anyc commented 5 months ago

As this happens during shutdown of my server, I just check now if there is a shutdown in progress and only if not, I send notifications. The process shuts down cleanly now. Thank you!