openxla / community

Stores documents and resources used by the OpenXLA developer community
Apache License 2.0
106 stars 23 forks source link

RFC: Shardonnay Partitioner #110

Closed andydavis1 closed 6 months ago

andydavis1 commented 7 months ago

This RFC introduces for early feedback the new Shardonnay partitioner for OpenXLA.

joker-eph commented 7 months ago

@andydavis1 (I can't find you on LLVM Discourse, do you have an account there?).

There is work going on similar mesh modeling upstream, this related work is very interesting and relevant, would you be available to present this at an MLIR Open Meeting? (That does not commit you to anything beyond that, it just seems it can be very instructive for the MLIR community at large and that can also influence the work upstream (and vice-versa maybe).

andydavis1 commented 7 months ago

@andydavis1 (I can't find you on LLVM Discourse, do you have an account there?).

There is work going on similar mesh modeling upstream, this related work is very interesting and relevant, would you be available to present this at an MLIR Open Meeting? (That does not commit you to anything beyond that, it just seems it can be very instructive for the MLIR community at large and that can also influence the work upstream (and vice-versa maybe).

Thanks Mehdi. I'm andydavis1 on LLVM discourse. We are planning to be at the next openxla community meeting, but could do an MLIR Open Meeting down the road if needed...

joker-eph commented 7 months ago

@andydavis1 : thanks! That would be great, whenever you’re ready to chat. Here is the upstream thread FYI: https://discourse.llvm.org/t/rfc-sharding-framework-design-for-device-mesh/73533/93

jpienaar commented 7 months ago

I was OOO during the recent community meeting. Was there any questions/comments/clarification from the recent community meeting that should be added here?

tomnatan30 commented 7 months ago

I was OOO during the recent community meeting. Was there any questions/comments/clarification from the recent community meeting that should be added here?

I think the two two main questions were:

  1. What are the advantages of GSPMD/PartIR? Which is explained briefly at the top of this RFC
  2. Timelines for turning it on by default. For which we don't have a clear answer yet