Closed zyj1729 closed 1 month ago
You could perhaps override this method in Caduceus?
To initialize from a pre-trained model, the best thing to do is pass a checkpoint / weights file that contains the state_dict
with the model parameters you want to load
I'm talking more about changing the number of bidirectional mamba layers (currently the largest is 16) or increasing the dimensions. The weights could just be randomly initialized as I will train it further. I have tried to increase the dimensions of some of the mamba layers. Here's the original model:
Here's what I changed to:
But I got the following errors:
So I'm just wondering if there is a way I can increase the size of the caduceus dimensions (number of parameters and layer number) without causing errors?
You can change these in the model config: https://github.com/kuleshov-group/caduceus/blob/main/configs/model/caduceus.yaml.
It sounds like the parameters you want to change are n_layer
and d_model
in that file.
I think it's exactly what I want. It would be really helpful if you could please provide me with the minimal code to initialize a Caduceus model from scratch with specified parameters. Thanks!
from caduceus.configuration_caducues import CaduceusConfig
from caduceus.modeling_caduceus import CaduceusForMaskedLM
config = CaduceusConfig(
d_model=<TODO: Set your desired model dim>,
n_layer=<TODO: Set your desired num_layers>,
... # TODO: Set the remaining config params, see for example: https://github.com/kuleshov-group/caduceus/blob/main/configs/model/caduceus.yaml
)
model = CaduceusForMaskedLM(config)
Hi there, thanks for creating the great model! I'm trying to use the caduceus architecture to fine-tune on my own task. I noticed that the results using pre-trained or random-initialized models are not significantly different. So, I want to try fine-tuning using a Caduceus model with a larger dimension or layer number. I tried to double the parameters but got some inner errors. Instead of tweaking the parameters, I just want to ask first if there is an easy way to initialize a larger Caduceus model? Or simply just initialize a bi-directional mamba architecture with customized parameters? It would be much appreciated if you could provide me with an example. Thanks in advance!