Open rockerBOO opened 6 months ago
You're right on that being the fix - would love a PR!
Added get_input_embedding()
to a clone of the huggingface repo but ran into the following, but not sure if it's related.
*edit: I'm not sure if i'm doing it right either from the clone or using it so could just be how it is "cached". I'm probably not using it properly.
Traceback (most recent call last):
File "/home/rockerboo/code/caption-train/moondream.py", line 390, in <module>
main(args)
File "/home/rockerboo/code/caption-train/moondream.py", line 121, in main
train(model, tokenizer, args)
File "/home/rockerboo/code/caption-train/moondream.py", line 303, in train
loss = compute_loss(batch, accelerator)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rockerboo/code/caption-train/moondream.py", line 243, in compute_loss
img_embs = model.vision_encoder.encoder(images)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rockerboo/.pyenv/versions/3.11.6/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rockerboo/.pyenv/versions/3.11.6/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rockerboo/.cache/huggingface/modules/transformers_modules/moondream2/vision_encoder.py", line 119, in forward
return self.model["visual"](x)
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rockerboo/.pyenv/versions/3.11.6/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rockerboo/.pyenv/versions/3.11.6/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rockerboo/.cache/huggingface/modules/transformers_modules/moondream2/vision_encoder.py", line 105, in forward
x = self.patch_embed(x)
^^^^^^^^^^^^^^^^^^^
File "/home/rockerboo/.pyenv/versions/3.11.6/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rockerboo/.pyenv/versions/3.11.6/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rockerboo/.cache/huggingface/modules/transformers_modules/moondream2/vision_encoder.py", line 129, in forward
b, c, hp1, wp2 = x.shape
^^^^^^^^^^^^^^
ValueError: not enough values to unpack (expected 4, got 3)
I'm getting this too, investigating.
Will have a fix for the ValueError: not enough values to unpack (expected 4, got 3)
bug pushed shortly.
Getting the following error when using gradient checkpointing with PEFT LoRA training.
Basically using the same script as the Finetuning notebook in this repo but adding the Lora PEFT to it.
With
model.enable_input_require_grads()
has similar NotImplementedErrorI think to fix this would be adding the
get_input_embeddings()
to returnself.text_model.get_input_embeddings()
I can give it a shot and make a PR here soon.
Thank you!
Related issues: