Open ZiadHelal opened 1 week ago
Hey @BenjaminBossan could you take a look at this draft PR? I will remove the separate (quantization) config from config.py in vera folder later on, I just want to know if I'm going in the right direction or not.
Hey @BenjaminBossan,
I've finished the 8-bit quantization and now it works with all tests passed, however, with 4-bit it's a bit tricky due to bnb packing of weights implementation. I've added a work-around (which is not correct) in the forward method and it can now train any model but again I think it's not correct, also it fails for the test_vera_bnb_quantization_from_pretrained_safetensors test in 4-bit.
I would very much appreciate it if you could direct me in which way to proceed for the 4-bit implementation.
Thanks for the updates.
I've added a work-around (which is not correct) in the forward method and it can now train any model but again I think it's not correct
Could you give me a pointer what lines of code exactly you mean here?
also it fails for the test_vera_bnb_quantization_from_pretrained_safetensors test in 4-bit.
For me the test fails too, albeit already during initialization, not in the forward pass.
Also pinging @vvvm23 and @dkopi for awareness.
Hey @BenjaminBossan, it now works on my side! thanks for your help.
Could you check if it works now with you?
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.
Thanks @ZiadHelal I can confirm that the tests are now passing on my machine. I think what would be great is if we could take one of the VeRA examples and verify that it works with 4bit and 8bit bnb. Of course, results won't be exactly the same, but we should expect roughly similar outcomes, probably slightly worse. This would be a nice confirmation that the implementation is correct. Is that something you would be willing to tackle?
Apart from that, please run make style
on your PR, so that our linter is happy and tests can be run.
Hi @BenjaminBossan, sorry for my late reply!
I've run make style
and the code should now be good for the linter. Regarding the tests, I've added several tests primarily for CausalLM but I would be happy to add further for the audio and seq2seq models or if you have any other tests in mind, please let me know.
Hmm, somehow the linter is still not happy, do you have the latest ruff version?
Anyway, I checked this out locally and here is the diff that I get:
modified src/peft/tuners/vera/bnb.py
@@ -134,9 +134,9 @@ if is_bnb_available():
torch.Tensor: The computed delta weight for the VeRA adapter.
Note:
- This method implements the VeRA-specific weight update. Unlike LoRA, VeRA uses shared
- projection matrices (vera_A and vera_B) across all layers, along with per-layer
- trainable parameters (lambda_d and lambda_b).
+ This method implements the VeRA-specific weight update. Unlike LoRA, VeRA uses shared projection
+ matrices (vera_A and vera_B) across all layers, along with per-layer trainable parameters (lambda_d and
+ lambda_b).
"""
# Retrieve shared projection matrices
vera_A = self.vera_A[adapter]
@@ -187,9 +187,9 @@ if is_bnb_available():
torch.Tensor: Output tensor after applying the VeRA adaptation.
Note:
- This method implements the VeRA-specific forward pass. It applies the shared
- projections (vera_A and vera_B) along with the per-layer trainable parameters
- (lambda_d and lambda_b) to compute the adapter output.
+ This method implements the VeRA-specific forward pass. It applies the shared projections (vera_A and
+ vera_B) along with the per-layer trainable parameters (lambda_d and lambda_b) to compute the adapter
+ output.
"""
if self.disable_adapters:
if self.merged:
I've run again the make style
command now. Hope it works!
Ouch, a bunch of tests are failing. Could you please investigate? Please LMK if you need help.
I've run the tests that are failing on my machine specifically test_initialization.py
, test_feature_extraction_models.py
, test_decoder_models.py
, and test_custom_models.py
. It appears that the refactored lines that you suggested for the function _find_dim
in model.py
in VeRA are causing these errors. Idk how to proceed next whether are these lines are correct and we need to adjust the tests accordingly or implementing a work around for _find_dim
to get the dimensions of vera_A & vera_B based on whether quantization is used or not.
This approach is what I have in mind (I haven't implemented it yet but should work ig).
def _find_dim(self, config) -> tuple[int, int]:
"""
Finds the largest input and output dimensions across linear layers that have been wrapped with VeRA.
This will be used for determining the size of the shared vera_A and vera_B matrices.
"""
model_config = self.get_model_config(self.model)
peft_config = self._prepare_adapter_config(config, model_config)
peft_config = _maybe_include_all_linear_layers(peft_config, self.model)
loaded_in_4bit = getattr(self.model, "is_loaded_in_4bit", False)
largest_shape = None
for key, module in self.model.named_modules():
if not self._check_target_module_exists(peft_config, key):
continue
if loaded_in_4bit:
if isinstance(module, nn.Linear):
module_shape = module.in_features, module.out_features
elif isinstance(module, Conv1D):
module_shape = module.weight.ds_shape if hasattr(module.weight, "ds_shape") else module.weight.shape
else:
continue
else:
if isinstance(module, (nn.Linear, Conv1D)):
module_shape = tuple(module.weight.shape)
if isinstance(module, Conv1D):
module_shape = module_shape[::-1]
else:
continue
if largest_shape is None:
largest_shape = module_shape
continue
if module_shape != largest_shape:
largest_shape = tuple(max(a, b) for a, b in zip(largest_shape, module_shape))
if largest_shape is None:
msg = "No layers types compatible with VeRA were found. Please check `peft_config.target_modules`."
raise ValueError(msg)
return largest_shape
I took a closer look and the code that I suggested had a simple error, I was returning the shapes in the wrong order. So the correct code should be:
if isinstance(module, nn.Linear):
module_shape = module.out_features, module.in_features
elif isinstance(module, Conv1D):
module_shape = module.weight.ds_shape if hasattr(module.weight, "ds_shape") else module.weight.shape
module_shape = module_shape[::-1]
As to your suggestion: Yes, possibly there needs to be some special handling for quantized weights. I haven't checked that yet.
I added your line suggestions and it now works for all the tests except (I think) one test test_decoder_models.py
which is complaining about something not related to VeRA. Maybe it won't fail in the pipeline. Can you run the tests again and see if it relates to VeRA, if so then I will apply my approach of adding the special handling for quantized weights.
one test
test_decoder_models.py
which is complaining about something not related to VeRA.
Which one is it?
test_generate_half_prec
this is the one but with several precisions
@BenjaminBossan, my bad. it should pass this test, sorry for the confusion. you can run the workflow now.
That would be strange, as VeRA tests are not run:
Anyway, I'll start the CI, let's see.
Are the 11 failing tests related to VeRA?
No, I don't think that they're related. This could possibly be caused by the latest transformers release, not sure. I'll investigate tomorrow.
Okay, thanks!
Small update, it is indeed unrelated, the tests started breaking due to a recent change in transformers. I'm trying to get to the bottom of it.
Ok got it, thanks for the update!
The fix is now merged. Once you merge with/rebase on main, the tests should pass.
Synced with main upstream.
This PR introduces support for 4-bit and 8-bit quantization in the VeRA method, leveraging
bitsandbytes
.Addresses #2070
Changes made: