openshmem-org / tests-sos

Sandia OpenSHMEM unit tests and performance testing suite
Other
6 stars 11 forks source link

Ordering issue in waituntil.c #22

Closed lyu closed 2 years ago

lyu commented 2 years ago

In waituntil.c, PE 0 first send {0, 0, ..., 0} to PE 1, then overwrites it with {1, 2, ..., 10}; while PE 1 waits for the end of the array to become 10.

However, since PUT does not provide ordering guarantee, we could get mixed results on PE 1 and thus the test hangs. This was confirmed on our A64FX cluster. Adding a shmem_fence() at line 90 solves this issue.

https://github.com/openshmem-org/tests-sos/blob/master/test/unit/waituntil.c#L90

davidozog commented 2 years ago

Agreed! Thanks @lyu.

Would you please post a PR with the fence upstream to https://github.com/Sandia-OpenSHMEM/SOS?

We'll then merge this fix to tests-sos as well as all the other unit test updates with the next release of SOS (v1.5.1), which is targeted for next month. We will likely want to add this to the v1.4.x branch(es) as well.

Also a side note: if you have time, it would be great if you could run the latest (master branch) SOS unit tests (with "make check") on your end before that release. We have updated/added many tests to use the v1.5 interfaces, so I'm curious if this causes any issues in your implementation.

lyu commented 2 years ago

@davidozog Thanks! I have opened the PR.

Sure! I will test as much as I can.