Enabling finetuning of vision encoder and projector

As the title says, now 1) full-finetuning + 2) LoRA for vision encoder, and 1) full-finetuning for vision projector are supported. At the moment the vision projector is rather lightweight compared with the vision encoder and the LLM within the model, so it should be fine that only full-finetuning is supported for vision projector.
The updates also fix #11, where previously due to the partial string matching, the linear layers within the CLIP's ViT will also be included as lora target modules.

zjysteven / lmms-finetune