This PR adds index selection for output_hidden_states by allowing a list of indices to be passed. In this initial version only clip is modified, note that models that copy from clip currently have some docstrings changed due to check_copies, after an initial review I'll apply the changes to those models, eventually these changes can be applied to all models.
The order of indices supplied to output_hidden_states is maintained in hidden_states output and duplicate indices are supported, this is required for vipllama and the tests for vipllama respectively.
The usage of output_hidden_states in llava and other versions is updated with the exception of llava_onevision as this model uses siglip, so it will be updated when the changes from clip are applied to siglip. I looked at the checkpoints listed in the examples for other llava versions and they all appear to use clip.
Note the changes in vipllava also required reversed due to the order of selected indices -2, -5, -8, -11 and 6.vipllava needs some special handling, specifically because the test case uses 0, 0, 1, 1, 0 as the selected indices.
The result of these changes is increased memory efficiency.
Fixes #33698
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
cc @amyeroberts
What does this PR do?
This PR adds index selection for
output_hidden_states
by allowing a list of indices to be passed. In this initial version onlyclip
is modified, note that models that copy fromclip
currently have some docstrings changed due tocheck_copies
, after an initial review I'll apply the changes to those models, eventually these changes can be applied to all models.Internally
output_hidden_states
normalises negative indices.The order of indices supplied to
output_hidden_states
is maintained inhidden_states
output and duplicate indices are supported, this is required forvipllama
and the tests forvipllama
respectively.The usage of
output_hidden_states
inllava
and other versions is updated with the exception ofllava_onevision
as this model usessiglip
, so it will be updated when the changes fromclip
are applied tosiglip
. I looked at the checkpoints listed in the examples for otherllava
versions and they all appear to useclip
.Note the changes invipllava
also requiredreversed
due to the order of selected indices-2, -5, -8, -11 and 6
.vipllava
needs some special handling, specifically because the test case uses0, 0, 1, 1, 0
as the selected indices.The result of these changes is increased memory efficiency.
Fixes #33698
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR. cc @amyeroberts