Open saeid93 opened 3 months ago
However, this is just a hack and I think this should work out of the box. I'm happy to investigate further when I get a chance to first solve #1960 .
Thanks a lot again for this detailed analysis, and again I would be very happy to accept a PR to fix this. Regarding the question of how to fix this: I wonder if it would be easier to change the logic inside of _check_forward_args
. Maybe we can relax the len(x) != len(adapter_names)
check or even (re)move it entirely if that's enough to fix the situation.
No problem!
About your question, as far as I understand removing the check won't solve the problem since there will be a mismatch between input size and the number of items in the adapter_names. Therefore in the lora_layer the LoRA weights are only applied proportional to the number of items in adapter_names
(same as the input batch size) which is fine in most models. However, in MobileVit due to the unfolding operation the dimension of the inputs is different from the number of the batch size, since adapter_names
length matches the batch the LoRA is not applied to all the inputs.
I also checked the accuracy without multi LoRA inference and it seems that my above explanation can also be validated by looking at accuracies. Removing the check does not match the accuracy of single LoRA inference.
I see, I thought it would be possible to remove the check or at least make it optional. The user then needs to ensure that the correct adapter_names
are passed so that they are lined up with the way that MobileVit unfolds the 0th dimension. The changes to MobileViTForImageClassification
would probably still be necessary (though the mixed batch feature is intended for inference only, not sure if that simplifies things). But I'm probably missing something. Anyway, a PR to fix the situation would be welcome.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Hi @BenjaminBossan , My recent pull request will not solve this issue. That pull requests solved https://github.com/huggingface/peft/issues/1960 . This one is a MobileViT specific problem which I was thinking to look for a potential solution next. Please let me know if you think it is no longer necessary, I just want to let you know that it is not relevant to #1990.
This issue was auto-closed by merging said PR because you wrote
I'm happy to go through models one by one and also fix #1967
and GH automatically parses "fix #XXXX" and closes the corresponding issue :)
Yes, let's leave this open. If you have time, it would be great if you could work on solving this issue here as well. We can further discuss potential solutions.
ah, I see :) Sure, I'll have a look when I get a chance.
Hi @BenjaminBossan,
I looked into this issue in more depth, but I'm still a bit unsure of the best way to implement a solution. I explored three different approaches, but each has its own challenges, which I've explained below. I would appreciate your opinion on these and any other solutions you might suggest.
As mentioned above, the problem is that the unfolding operation changes the dimensions of the input in MobileViT. As a result, we need to scale the adapter_names
in proportion to the patch size of the input:
# -------- Adjusting the size of the adapter_names input ----------
if model.base_model.model.base_model_prefix == "mobilevit":
patch_size = model.config.patch_size
multiply = patch_size ** 2
resized_adapters_names = []
for item in batch["adapter_names"]:
multiplied = [item] * multiply
resized_adapters_names += multiplied
batch["adapter_names"] = resized_adapters_names
outputs = model(**batch)
Note that after the fixes in https://github.com/huggingface/peft/pull/1990, this solution will no longer work out of the box since the MobileViT part expects the modified format above, while the classifier part expects the original length for the adapter_names
input.
Attempt to modify the code to change the adapter_names
layer differently for the ViT part and the classifier part.
This solution aims to apply the same workaround I'm currently doing (subclassing MobileViT) but without subclassing, instead injecting the modified logic dynamically—similar to this approach—by using a pre-hook that can adjust adapter_names
if the model type is mobilevit
. The challenge I encountered was determining how much to scale adapter_names
. We need access to the patch_size
variable in a function like this one, which can then be added as a pre-hook. However, passing down the patch_size
variable required substantial changes to the existing function signatures, which complicates this approach.
Rewrite how the PEFT library applies LoRA layers.
In this approach, I considered rewriting the SelfAttention layer of MobileViT to account for the size change when LoRA is applied, potentially by adding a dispatcher for MobileViT. However, this required significant changes to how LoRA layers are added, which could potentially disrupt other parts of the model.
Reimplement MobileViT with an inherited function in the PEFT library (similar to the workaround I used earlier, but with modifications to account for the fixes in #1990). The downside of this solution is that it involves adding special-case logic for a specific model type in the PEFT library, which feels overly hacky.
Please let me know if you have any suggestions for a better approach or any comments on the solutions discussed. I'm happy to proceed based on your recommendations.
Thanks for digging deeper into this issue and thinking of a view possible solutions. As you discussed, each of them has its own drawbacks so it's not clear how to proceed.
Something that came to my mind is the following solution: Let's say we have n
entries in adapter_names
but len(x)
is k * n
. Could we just "broadcast" adapter_names
to repeat each sample k
times? Since this would be done on a per layer basis, this should hopefully not interfere with layers that don't need it. Of course, this is a bit "magic" and could potentially misfire when the two are just fitting by accident, but maybe we can live with that.
LMK what you think of this solution.
Thank you for the suggestion! I'll take a closer look when I get the chance. One quick question that comes to mind: since adapter_names
is a list and doesn't support broadcasting, are you suggesting that everything should be converted to NumPy arrays before attempting this solution?
No, I don't really mean broadcasting in the sense of numpy, hence why I wrote "broadcasting" :) What I mean is repeating the same items multiple times. Simplified code would be something like this:
adapter_names = ["a", "b", "a", "c"]
x = range(12) # 3 times the size of adapter_names
quot, remainder = divmod(len(adapter_names), len(x)) # 3, 0
if remainder != 0:
raise ...
adapter_names = sum([[i] * quot for i in adapter_names], [])
print(adapter_names) # ['a', 'a', 'a', 'b', 'b', 'b', 'a', 'a', 'a', 'c', 'c', 'c']
Thank you for your clarification, I'll work on it when I get a chance.
Great, thanks. Of course I might be missing something and one of your proposals could make more sense.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
System Info
Python 3.11.9 transformers==4.40.2 peft==0.11.2
Who can help?
@BenjaminBossan
Information
Tasks
examples
folderReproduction
MobileVit model is not compatible with using multiple adapters in the same batch. Inferencing batches using multiple adapters using the adapter_names in the batch will trigger the following exception:
https://github.com/huggingface/peft/blob/273acf059e0f1f8bff1a3889f901475e9eb3b7ee/src/peft/tuners/lora/layer.py#L308
The root cause is that during the unfolding operation in the transformers library MobileVit the first dimension of the input is changed from
batch_size, ...
is changed tobatch_size * patch_size**2, ...
which makes it inconsistent with theadapter_names
dimensions which is of length ofbatch_size
and each entry refers to each of the batch items' adapter.Expected behavior
I solved this by a hack that modifies the
adapter_names
input size before sending it to the model and reverting it back to the original size for the classifier. It makes the entries proportional to the size made during the unfolding operation.Also, we already discussed that there is a bug https://github.com/huggingface/peft/issues/1960 other than this MobileViT specific problem. Below script is the modifications needed both for https://github.com/huggingface/peft/issues/1960 and the mentioned problem together.
However, this is just a hack and I think this should work out of the box. I'm happy to investigate further when I get a chance to first solve https://github.com/huggingface/peft/issues/1960 .