Enable Multi Layer Perceptron (MLP) selection for projector
First of all, thank you for creating such an amazing project!
This repository has become very useful for me.
Changes
Now, I have made a modification to the code to allow the projector to be a Multi Layer Perceptron (MLP) when model_type: git_llm is selected.
Previously, when using model_type: git_llm, a single Linear layer was applied as the projector that connects the Vision model and the LLM. However, inspired by LLaVA v1.5 【Liu+'23 Improved Baselines with Visual Instruction Tuning】, I have added code that makes it possible to vary the number of these Linear layers simply by adding an option (mlp_adapter) in projects/OOO/OO.yml under model_config.
The main details of the code for changing the projector to an MLP can be understood by looking at heron/models/mlp_adapter.py.
Furthermore, this code references the github implementation of LLaVA v1.5 ( https://github.com/haotian-liu/LLaVA/blob/785f766fcddc86ffeaa62cd51cf7834a11c04e6d/llava/model/multimodal_projector/builder.py#L33 ).
Also, to maintain compatibility, I've made sure it works the same way as before with the existing projects/OOO/OO.yml.
For example, if you use projects/llama/exp001.yml as it is,
In the above example, by adding mlp_adapter: mlp2x_gelu under model_config, the projector will become a 2-layer MLP, but if you want it to be 3 layers, simply changing to mlp_adapter: mlp3x_gelu will make it a 3-layer MLP easily!
Enable Multi Layer Perceptron (MLP) selection for projector
First of all, thank you for creating such an amazing project! This repository has become very useful for me.
Changes
Now, I have made a modification to the code to allow the projector to be a Multi Layer Perceptron (MLP) when
model_type: git_llm
is selected. Previously, when usingmodel_type: git_llm
, a single Linear layer was applied as the projector that connects the Vision model and the LLM. However, inspired by LLaVA v1.5 【Liu+'23 Improved Baselines with Visual Instruction Tuning】, I have added code that makes it possible to vary the number of these Linear layers simply by adding an option (mlp_adapter
) inprojects/OOO/OO.yml
undermodel_config
. The main details of the code for changing the projector to an MLP can be understood by looking atheron/models/mlp_adapter.py
. Furthermore, this code references the github implementation of LLaVA v1.5 ( https://github.com/haotian-liu/LLaVA/blob/785f766fcddc86ffeaa62cd51cf7834a11c04e6d/llava/model/multimodal_projector/builder.py#L33 ).Also, to maintain compatibility, I've made sure it works the same way as before with the existing
projects/OOO/OO.yml
.For example, if you use
projects/llama/exp001.yml
as it is,As before, a single layer Linear layer will be applied as the projector.
If you want to change the projector to an MLP, add the
mlp_adapter
item tomodel_config
inprojects/llama/exp001.yml
and give it the namemlp2x_gelu
.In the above example, by adding
mlp_adapter: mlp2x_gelu
undermodel_config
, the projector will become a 2-layer MLP, but if you want it to be 3 layers, simply changing tomlp_adapter: mlp3x_gelu
will make it a 3-layer MLP easily!