Open BowenBao opened 2 months ago
What is the advantage of doing it this way? The current process is to take advantage of the fact that the model builder is aware of the
Is this for experimentation purposes? If so, maybe we can expose a extra_options flag to override the default attention operator.
Hi @baijumeswani, the idea is to decouple the tie of device/dtype with built attention op. Consider custom eps that implements attention op with dtype not supported in ort cpu/cuda.
Currently these are inferred from the combination of other configurations such as device and dtype. It is more flexible for downstream users if this can be selected by choice.