openucx / ucx

Unified Communication X (mailing list - https://elist.ornl.gov/mailman/listinfo/ucx-group)
http://www.openucx.org
Other
1.15k stars 427 forks source link

Error in running HiBench Spark wordcount #9897

Open pubaibai opened 5 months ago

pubaibai commented 5 months ago

The test items are: https://github.com/openucx/sparkucx Test environment: We set up a Hadoop cluster with 2 machines. The RDMA network card model is Mellanox Technologies MT27800 Family [ConnectX-5] Hadoop version: 3.2.1 Spark version: 3.0.0 IMG_20240522_095341 IMG_20240522_095401 IMG_20240522_095341

When we applied the sparkucx plugin, we tested the wordcount sample in HiBench and found the following error printed in the Spark task log: proto_select.c:232 Assertion init_params.rkey_config_key->ep_cfg_index == ep_cfg_index failed: rkey->ep_cfg_index=0 ep_cfg_index=4 IMG_20240522_094506

The specific error stack and RDMA network card information are attached

yosefe commented 5 months ago

@pubaibai what is the UCX version?