ofi-cray / libfabric-cray

Open Fabric Interfaces
http://ofiwg.github.io/libfabric/
Other
16 stars 9 forks source link

Scalable endpoint fails with -22 on GNI #1386

Closed bcernohous closed 7 years ago

bcernohous commented 7 years ago

Verify proper functioning of scalable EP usage

I ran the attached fi_write_scalable_ep.c

fi_write_scalable_ep.zip

The sockets provider appears to be working.

But gni fails with fi_scalable_ep_bind ret:-22 error: Invalid argument

$ cc -g -o fi_write_scalable_ep fi_write_scalable_ep.c -I/.../libfabric-test/builds/latest-libfabric.../include/ -L/.../libfabric-test/builds/latest-libfabric.../lib/ -lfabric

and the trace is

libfabric:gni:ep_ctrl:gnix_nic_alloc():1308 [49231:1] Allocated NIC:0xb30000 libfabric:gni:ep_ctrl:_gnix_dgram_hndl_alloc():577 [49231:1] libfabric:gni:ep_ctrl:gnix_nic_alloc():1308 [11309:1] Allocated NIC:0xb30000 libfabric:gni:ep_ctrl:_gnix_dgram_hndl_alloc():577 [11309:1] libfabric:gni:ep_ctrl:__gnix_nic_prog_thread_fn():91 [11309:2] libfabric:gni:av:gnix_av_open():837 [11309:1] libfabric:gni:ep_ctrl:__gnix_nic_prog_thread_fn():91 [49231:2] libfabric:gni:av:gnix_av_open():837 [49231:1] libfabric:gni:ep_ctrl:gnix_sep_bind():375 [49231:1] fi_write_scalable_ep: fi_write_scalable_ep.c:215: main: Assertion ret == 0' failed. libfabric:gni:ep_ctrl:_gnix_dgram_prog_thread_fn():79<trace> [49231:3] libfabric:gni:ep_ctrl:_gnix_dgram_prog_thread_fn():79<trace> [11309:3] libfabric:gni:ep_ctrl:gnix_sep_bind():375<trace> [11309:1] fi_write_scalable_ep: fi_write_scalable_ep.c:215: main: Assertionret == 0' failed. libfabric:gni:ep_ctrl:_gnix_dgram_poll():436 [49231:3] libfabric:gni:ep_ctrl:_gnix_dgram_poll():436 [11309:3] Application 4626945 is crashing. ATP analysis proceeding...

hppritcha commented 7 years ago

The GNI provider binding method for sep's isn't correct for address vectors. It assumes the rx/tx contexts are already allocated prior to binding the sep to an av. Should have a fix on Monday.

hppritcha commented 7 years ago

I have a fix for this but I assume the test is still pretty buggy. I notice it frees hints then proceeds on further in the program to do a strcmp with a field in hints. @bcernohous