stonier / zeroconf_avahi_suite

Avahi implementation of zeroconf for ros.
2 stars 4 forks source link

entry_group_callback bug #2

Open ghost opened 11 years ago

ghost commented 11 years ago

I'm sporadically getting the following when using the library:

Zeroconf : should never reach here, please report a bug in zeroconf_avahi's entry_group_callback.

My setup is two machines running the same daemon. Each daemon registers a service with the same name, and listens for a non-local service to connect to.

This works great 99% of the time, but occasionally the library will print the above error and it will miss a new/lost service callback.

avahi-daemon 0.6.30 Ubuntu 12.04 Code that sporadically triggers the behavior:

#include <iostream>
#include <zeroconf_avahi/zeroconf.hpp>
#include <zeroconf_msgs/PublishedService.h>
using namespace std;

void zeroconf_new_cb(zeroconf_msgs::DiscoveredService s) {
    if(! s.is_local && s.type == "_foo._tcp")
        cout << "new_cb" << endl;
}
void zeroconf_lost_cb(zeroconf_msgs::DiscoveredService s) {
    if(!s.is_local && s.type == "_foo._tcp")
        cout << "lost_cb" << endl;
}

int main(int argc, char** argv) {
    zeroconf_avahi::Zeroconf zeroconf;
    zeroconf_msgs::PublishedService service;

    service.name = "foo";
    service.type = "_foo._tcp";

    zeroconf.connect_signal_callbacks(zeroconf_new_cb, zeroconf_lost_cb);
    zeroconf.add_listener(service.type);
    zeroconf.add_service(service);

    while(1) {
        usleep(500000);
    }
}
stonier commented 11 years ago

Thanks for the detailed explanation and code Brian. I'll have a look at this on the weekend.

stonier commented 11 years ago

Hmm (self-ponticating here), wonder if this is relevant - from the avahi sources:

AvahiEntryGroup* avahi_entry_group_new(
    AvahiClient* c,
    AvahiEntryGroupCallback callback /**< This callback is called whenever 
the state of this entry group changes. May not be NULL. Please note that 
this function is called for the first time from within the avahi_entry_group_new() context! 
Thus, in the callback you should not make use of global variables that are initialized only 
after your call to avahi_entry_group_new(). A common mistake is to store the 
AvahiEntryGroup pointer returned by avahi_entry_group_new() in a global variable and 
assume that this global variable already contains the valid pointer when the callback is 
called for the first time. A work-around for this is to always use the AvahiEntryGroup 
pointer passed to the callback function instead of the global pointer. */
stonier commented 11 years ago

I'm probably doing exactly what you should not here

stonier commented 11 years ago

Brian, trying to reproduce that error here - is that error showing up on startup or after sometime when the service is dynamically added or removed?

ghost commented 11 years ago

I thought I responded last month...maybe I forget to submit? Anyway, the error occurs when one side has been running for a while and then the remote is started.

stonier commented 11 years ago

Can't reproduce this, even when adding sleeps in suspect locations or putting it on an automatic bash loop till failure. The logic checks out also (see below).

I'm wondering if there's a problem causing it from elsewhere. What version of avahi? What system? Are there name collisions happening when it occurs (i.e. multiple avahi publishers trying to publish the same name). From above:

This is the same as my current testing pc.

For the record, revisiting the note from the doxygen above, I think we're fine. That add_entry_group callback for entry_group_new() processing triggers the uncommitted response (I deliberately don't commit till after stuffing the global variable) which does no processing. So the logic should check out. If there is something unaware of going on, I'll need to reproduce it first before going off on wild goose chases.

ghost commented 11 years ago

I think you hit the nail on the head.

I was publishing the same name from both processes (same executable in fact). I modified the code to publish with different names and I can no longer confuse it.

I assumed that using the same name was fine because avahi would simply rename the second one to "name #2" but this appears to occasionally not work. I'm fine with using different names now that I know it is necessary.