pytorch / xla

Enabling PyTorch on XLA Devices (e.g. Google TPU)
https://pytorch.org/xla
Other
2.46k stars 469 forks source link

Help RoPE fusion #6820

Open ckfgihub opened 6 months ago

ckfgihub commented 6 months ago

❓ Questions and Help

I use the set of tools pytorch/torch xla/openxla, and I want to fuse the operator RoPE into a custom operator, so that the hardware can operate directly. Do you think which layer I should do this better? In the xla pass? Define a RoPE operator in the python layer? Or has the existing framework already implemented this problem of mine?

JackCaoG commented 6 months ago

This depends on how you uses PyTorch/XLA. If you use torch.export, then it might be easier to add a fx graph pass similar to https://github.com/pytorch/xla/commit/7e0d3a5ae5d1aeb2a6c24653ad2e4d4357035040.

If you expect to go through the regualr lazy execution workflow, the HLO is being built when mark_step is being called, you might want to write a HLO pass to do graph matching.