In current implementation of VLMs, the "_supports_sdpa" attribute checks and activates SDPA attention only for the language model. For example in Llava
It should also check and if available use SDPA attention for vision tower.
We can raise a warning for composite models if only one part support sdpa, but other does not, and activate SDPA for the supported part. That waythe user knows what is happening in the background.
System Info
In current implementation of VLMs, the "_supports_sdpa" attribute checks and activates SDPA attention only for the language model. For example in Llava
It should also check and if available use SDPA attention for vision tower.
We can raise a warning for composite models if only one part support sdpa, but other does not, and activate SDPA for the supported part. That waythe user knows what is happening in the background.
Verified models