pmem / rpma

Remote Persistent Memory Access Library
Other
101 stars 56 forks source link

MT-test failed, maybe a synchro issue in the MT-framework? #1079

Closed ldorau closed 2 years ago

ldorau commented 3 years ago

MT-test failed: https://app.circleci.com/pipelines/github/pmem/rpma/194/workflows/74f049b3-b477-4ccc-8d1d-82883a9fed66/jobs/199

Maybe a synchro issue in the MT-framework?

ldorau commented 3 years ago

@janekmi

yangx-jy commented 3 years ago

@ldorau @janekmi

It seems that mt-framework lacks process synchronization because fork() cannot determine the execution order of (parent & child) processes. We need to finish rdma_listen() in child before running rdma_connect in parent by some mechanisms (e.g. futex, semaphore, signal? not sure which one is the best).

janekmi commented 3 years ago

You are right @yangx-jy. I think the easier way is to provide an interprocess-synchronization based on semaphores. I have proposed a very rough proof of concept. But it requires some additional work to become an elegant solution.

As for now, I recommend turning off all tests that have sneaked into our CI introducing this issue.

Ref: https://github.com/pmem/rpma/pull/1050

ldorau commented 2 years ago

Fixed by https://github.com/pmem/rpma/pull/1604 and https://github.com/pmem/rpma/pull/1635