zeromq / czmq

High-level C binding for ØMQ
czmq.zeromq.org
Mozilla Public License 2.0
1.18k stars 525 forks source link

1.Fix the bug that after calling zsys_shutdown() for the first time, … #2274

Closed ZhuXxxx closed 6 months ago

ZhuXxxx commented 1 year ago

…s_mutex is not available even if zsys_init() is called again.

2.Fix the bug that caused the exit callback function table to be full after calling zsys_init() and zsys_shutdown() multiple times.

Pull Request Notice

Before sending a pull request make sure each commit solves one clear, minimal, plausible problem. Further each commit should have the following format:

Problem: X is broken

Solution: do Y and Z to fix X

Please avoid sending a pull request with recursive merge nodes, as they are impossible to fix once merged. Please rebase your branch on zeromq/czmq master instead of merging it.

git remote add upstream git@github.com:zeromq/czmq.git git fetch upstream git rebase upstream/master git push -f

In case you already merged instead of rebasing you can drop the merge commit.

git rebase -i HEAD~10

Now, find your merge commit and mark it as drop and save. Finally rebase!

If you are a new contributor please have a look at our contributing guidelines: CONTRIBUTING.md

sphaero commented 7 months ago

Sorry for the delay but I've never seen this problem. I'm willing to merge this but the CI logs have expired so I can't check whether the failures are due to this change.

It seems you are talking about a bug when zsys_shutdown is called before zsys_init is called?

ZhuXxxx commented 7 months ago

Sorry for the delay but I've never seen this problem. I'm willing to merge this but the CI logs have expired so I can't check whether the failures are due to this change.

It seems you are talking about a bug when zsys_shutdown is called before zsys_init is called?

In Unix, variable 's_mutex' in file 'zsys.c' is initialized only once during the whole process lifetime by 'pthread_once()', and it will be destroyed when calling function 'zsys_shutdown()', If you call sys_init , and then call sys_shutdown, and call sys_init for the second time, variable ‘s_mutex’ is not initialized again, so 's_mutex' is invalid because it has been destroyed in function sys_shutdown, it can not play a role in ensuring thread safety, now if you call sys_shutdown for the second time in a multi-threaded environment, and other threads are using z_sock, you will most likely observe a program crash。

Of cause, if sys_shutdown is called only once at the end of the process, it's not necessary to make change, but you provided a demo of calling sys_init and sys_shutdown multiple times in the file 'sys.c' function 'zsys_test()',so to make sure ‘s_mutex’ is initialized only once and will be destroyed only once,I made this modifications.

sphaero commented 6 months ago

So doing this causes a potential error you say?

zsys_init();
zsys_shutdown();
zsys_init();
zsys_shutdown();
zsys_init();

It must be an edge case. What should I do to make it crash?

I'm not sure what this fixes. Isn't it that your fix only guarantees that the mutex is destroyed only once the first time it is called?

ZhuXxxx commented 6 months ago

So doing this causes a potential error you say?

zsys_init();
zsys_shutdown();
zsys_init();
zsys_shutdown();
zsys_init();

It must be an edge case. What should I do to make it crash?

I'm not sure what this fixes. Isn't it that your fix only guarantees that the mutex is destroyed only once the first time it is called?

My first fix is replace 'atexit (zsys_shutdown);' with 'pthread_once(&register_atexit_shutdown, zsys_register_atexit_shutdown);', it guarantees ‘sys_shutdown()’ will only be registered once, if not fix it, calling sys_init and sys_shutdown multiple times(about 32) will cause the atexit function table to become full.

My second fix is add 'atexit (zsys_destroy_mutex);' to function 'zsys_initialize_mutex()', and remove 'ZMUTEX_DESTROY (s_mutex);' in Unix environment, it guarantees 's_mutex' is always valid during the whole process lifetime, it will be destroyed after 'main()', if not fix it, 's_mutex' will become invalid after calling 'sys_shutdown', even if the process will continue to run.

My business is too complex to show, but I will show you another deme to prove this bug is causing some problems:

static void func()
{   
    ::std::vector<void*> vec;
    while (0 == zsys_interrupted) {
        while (vec.size() < 100000 && 0 == zsys_interrupted) {
            void* const handle = zsys_socket(ZMQ_PULL, nullptr, 0);
            if (nullptr == handle) {
                continue;
            }
            vec.push_back(handle);
        }
        for (void* const handle : vec) {
            zsys_close(handle, nullptr, 0);
        }
        vec.clear();
    }
}
int main(const int argc, const char* const argv[])
{
    zsys_init();
//    zsys_shutdown();
//    zsys_init();
    //Create two threads to call zsys_sockect and  zsys_close frequently
    ::std::thread t0(func);
    t0.detach();
    ::std::thread t1(func); 
    t1.detach();                            
    ::std::this_thread::sleep_for(::std::chrono::milliseconds(1000));
    //End two other threads
    zsys_interrupted = 1;
    ::std::this_thread::sleep_for(::std::chrono::milliseconds(1000));                   
    //Shutdown
    zsys_shutdown();                                                
    return 0;
}

If you run this demo on Linux OS, it works fine.

But if you 'zsys_init(); zsys_shutdown(); zsys_init();', like this:

int main(const int argc, const char* const argv[])
{
    zsys_init();
    zsys_shutdown();
    zsys_init();
    //Create two threads to call zsys_sockect and  zsys_close frequently
    ::std::thread t0(func);
    t0.detach();
    ::std::thread t1(func);
    t1.detach();
    ::std::this_thread::sleep_for(::std::chrono::milliseconds(1000));
    //End two other threads
    zsys_interrupted = 1;
    ::std::this_thread::sleep_for(::std::chrono::milliseconds(1000));
    //Shutdown
    zsys_shutdown();   

    return 0;
}

You will see a error like 'E: 24-03-09 01:18:29 dangling sockets: cannot terminate ZMQ safely 12'.

Both zsys_socket and zsys_close have critical sections protected by s_mutex, so if s_mutex is invalid, some problems will occur.

There was no crash in this example, but the crash did happen in my business.(Complex scenes using zactor and zloop)

In the final analysis, it is all because s_mutex invalid forever after calling sys_shutdown at the first time, even if you called sys_init a second time.

sphaero commented 6 months ago

I guess you have given this thought. I'll merge it. If it comes back with trouble we can revert it.

Btw, if you are using this lib we can use some help in maintaining this. Mostly consists of merging PRs. If you want to help out have a look at C4 which describes how we work.