Output
Server Output:
Sending 10 tagged messages
Waiting for messages to complete
munmap_chunk(): invalid pointer
Aborted (core dumped)
Server Backtrace:
gdb) bt
0 0x00007ffff6496aff in raise () from /lib64/libc.so.6
1 0x00007ffff6469ea5 in abort () from /lib64/libc.so.6
2 0x00007ffff64d9097 in __libc_message () from /lib64/libc.so.6
3 0x00007ffff64e04ec in malloc_printerr () from /lib64/libc.so.6
4 0x00007ffff64e079c in munmap_chunk () from /lib64/libc.so.6
5 0x00007ffff7a88e0f in psm3_free_internal (ptr=0x735a80, curloc=0x7ffff7b12953 "prov/psm3/psm3/psm_ep.c:1163")
at prov/psm3/psm3/psm_utils.c:3964
6 0x00007ffff7a63d41 in psm3_ep_close (ep=0x636ac0, mode=0, timeout_in=2000000000) at prov/psm3/psm3/psm_ep.c:1163
7 0x00007ffff7a29b31 in psmx3_trx_ctxt_free (trx_ctxt=0x62b3a0, usage_flags=3) at prov/psm3/src/psmx3_trx_ctxt.c:223
8 0x00007ffff7a11cea in psmx3_ep_close (fid=0x7349b0) at prov/psm3/src/psmx3_ep.c:234
9 0x0000000000403fb1 in fi_close (fid=)
at /path_to_libfabric_install/include/rdma/fabric.h:632
10 ft_close_fids () at common/shared.c:1792
11 0x0000000000404a9a in ft_free_res () at common/shared.c:1862
12 0x0000000000401b2a in main (argc=, argv=) at functional/rdm_tagged_peek.c:364
Client Output:
Peek for a bad msg
Peek w/ claim for a bad msg
Peek msg 1
Receive msg 1
Peek w/ claim msg 2
Receive claimed msg 2
Peek & discard msg 3
Checking to see if msg 3 was discarded
Peek w/ claim msg 4
Claim and discard msg 4
Receive msg 5
Receive msg 6
Receive msg 10
Receive msg 9
Receive msg 8
Receive msg 7
Environment:
rocky 8.7 mlnx 5.0
Additional context
Setting and unsetting FI_PROVIDER fixes this bug
Specific free() call that fails is freeing the hfi_nids struct in file psm_ep.c:1163
fi_rdm_tagged_peek fails to cleanup on the server side with "munmap_chunk(): invalid pointer" if FI_PROVIDER="psm3" is set.
To Reproduce server_cmd: FI_PROVIDER=psm3 fi_rdm_tagged_peek -p psm3 -E client_cmd: FI_PROVIDER=psm3 fi_rdm_tagged_peek -p psm3 -E "server_address"
Expected behavior Test passes successfully
Output Server Output: Sending 10 tagged messages Waiting for messages to complete munmap_chunk(): invalid pointer Aborted (core dumped)
Server Backtrace: gdb) bt
0 0x00007ffff6496aff in raise () from /lib64/libc.so.6
1 0x00007ffff6469ea5 in abort () from /lib64/libc.so.6
2 0x00007ffff64d9097 in __libc_message () from /lib64/libc.so.6
3 0x00007ffff64e04ec in malloc_printerr () from /lib64/libc.so.6
4 0x00007ffff64e079c in munmap_chunk () from /lib64/libc.so.6
5 0x00007ffff7a88e0f in psm3_free_internal (ptr=0x735a80, curloc=0x7ffff7b12953 "prov/psm3/psm3/psm_ep.c:1163")
6 0x00007ffff7a63d41 in psm3_ep_close (ep=0x636ac0, mode=0, timeout_in=2000000000) at prov/psm3/psm3/psm_ep.c:1163
7 0x00007ffff7a29b31 in psmx3_trx_ctxt_free (trx_ctxt=0x62b3a0, usage_flags=3) at prov/psm3/src/psmx3_trx_ctxt.c:223
8 0x00007ffff7a11cea in psmx3_ep_close (fid=0x7349b0) at prov/psm3/src/psmx3_ep.c:234
9 0x0000000000403fb1 in fi_close (fid=)
10 ft_close_fids () at common/shared.c:1792
11 0x0000000000404a9a in ft_free_res () at common/shared.c:1862
12 0x0000000000401b2a in main (argc=, argv=) at functional/rdm_tagged_peek.c:364
Client Output:
Peek for a bad msg
Peek w/ claim for a bad msg
Peek msg 1
Receive msg 1
Peek w/ claim msg 2
Receive claimed msg 2
Peek & discard msg 3
Checking to see if msg 3 was discarded
Peek w/ claim msg 4
Claim and discard msg 4
Receive msg 5
Receive msg 6
Receive msg 10
Receive msg 9
Receive msg 8
Receive msg 7
Environment: rocky 8.7 mlnx 5.0
Additional context Setting and unsetting FI_PROVIDER fixes this bug Specific free() call that fails is freeing the hfi_nids struct in file psm_ep.c:1163