openxla / stablehlo

Backward compatible ML compute opset inspired by HLO/MHLO
Apache License 2.0
390 stars 105 forks source link

Make "Rendezvous" variadic #2443

Closed abhigunj closed 2 months ago

abhigunj commented 2 months ago

The PR allows each Process from the ProcessGrid to contribute more than one tensor at ProcessGrid::rendezvous. This is the first set of change, will be followed by interpreter updates to collectives.

Note: The PR does not make collectives interpreter variadic.

Tested:

  1. No existing test failures indicate no change in behavior for ops using the rendezvous
  2. The diff is tested with new variadic interpreter for all_reduce op. Will upload the PR soon. (https://github.com/openxla/stablehlo/pull/2450)

RFC: https://github.com/openxla/stablehlo/pull/2099

sdasgup3 commented 2 months ago

Is there a way to test the rendezvous function in isolation?

abhigunj commented 2 months ago

Is there a way to test the rendezvous function in isolation?

2. The diff is tested with new variadic interpreter for all_reduce op. Will upload the PR soon.

you mean other than ^ test?

sdasgup3 commented 2 months ago

Is there a way to test the rendezvous function in isolation?

2. The diff is tested with new variadic interpreter for all_reduce op. Will upload the PR soon.

you mean other than ^ test?

I was wondering if it is possible to test the changes proposed in this PR with multiple operands feeding the rendezvous. Its fine in case this can only to tested at the operation level (via the upcoming PR).