Closed qiaoqiaoLF closed 3 months ago
Is it possible that you override the model config after you initialize it to have hidden dim 768?
Is it possible that you override the model config after you initialize it to have hidden dim 768?
Hi, yair-schiff. I believe I can reproduce the error using the following code:
import torch
# Load model directly
from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained("kuleshov-group/caduceus-ps_seqlen-131k_d_model-256_n_layer-16", trust_remote_code=True)
input_ids = torch.randint(0, 6, (1, 131072), dtype=torch.long)
input_ids = input_ids.cuda()
model = model.cuda()
outputs = model(input_ids)
Thanks! very helpful. I will take a look at this and keep you posted here.
Thank you for catching this bug! I pushed a fix to the repo as well as to the HF models.
As mentioned in https://github.com/kuleshov-group/caduceus/blob/c50dca8cdeddb1fb02e0246f26356f2f22739b1f/caduceus/modeling_caduceus.py#L537 hidden states have 2 * d_model channels for RCPS
But here, https://github.com/kuleshov-group/caduceus/blob/c50dca8cdeddb1fb02e0246f26356f2f22739b1f/caduceus/modeling_caduceus.py#L546 it just stack the first 1/4 of channels and the remaining 3/4, leading to an error as follows "RuntimeError: stack expects each tensor to be equal size, but got [1, 131072, 128] at entry 0 and [1, 131072, 384] at entry 1"