ofiwg / libfabric

Open Fabric Interfaces
http://libfabric.org/
Other
555 stars 376 forks source link

fabtests: Disable fi_rdm_tagged_peek for cleanup failure for psm3 and ucx #10124

Closed zachdworkin closed 3 months ago

zachdworkin commented 3 months ago

fi_rdm_tagged_peek fails to cleanup with "munmap_chunk(): invalid pointer" when trying to free hfi_nids in psm_ep.c:1161. This test is successful when FI_PROVIDER is unset and fails when it is set to "psm3" or "PSM3". There is an open issue in ofiwg/libfabric to track this bug. When it is resolved we can re-enable this test.

Issue opened: #10123

fi_rdm_tagged_peek fails to cleanup with "segmentation failt" when trying to cleanup the endpoint. This failure is a race condition and has no known 100% fail case.

Issue opened: #10126

zachdworkin commented 3 months ago

bot:aws:retest

shijin-aws commented 3 months ago

@zachdworkin AWS CI currently is broken due to a dependency issue. I will fix it shortly

zachdworkin commented 3 months ago

@shijin-aws Thanks for the head's up! Can you please replay this PR when its fixed?

shijin-aws commented 3 months ago

Yep, will do

zachdworkin commented 3 months ago

@shijin-aws since these changes are to the .exclude files for fabtests do we need to wait for aws ci?

shijin-aws commented 3 months ago

Yeah I think you can feel free to merge it.

shijin-aws commented 3 months ago

AWS CI doesn't run psm3 and ucx tests