pytorch / tensorpipe

A tensor-aware point-to-point communication primitive for machine learning
Other
249 stars 75 forks source link

Select ibv device who has active port_state. #456

Open SolenoidWGT opened 2 years ago

SolenoidWGT commented 2 years ago

If the deviceList contains multiple ibv devices, we want to select the device of the port whose port_state is active, instead of just selecting the first device in the deviceList by default. This is very useful. If we choose the first device without checking, it is likely that the IB runtime can be initialized successfully, but some weird errors will be reported in the ibv_post_send stage. At this time, it is difficult to determine the reason for the error is that we chose a wrong ibv device.

This PR is to fix #455.