There seems a bug of related to d_model in class "CaduceusForSequenceClassification" for the RCPS case

kuleshov-group / caduceus

Bi-Directional Equivariant Long-Range DNA Sequence Modeling

Apache License 2.0

137 stars 14 forks source link

There seems a bug of related to d_model in class "CaduceusForSequenceClassification" for the RCPS case #5

Closed qiaoqiaoLF closed 3 months ago

qiaoqiaoLF commented 3 months ago

As mentioned in https://github.com/kuleshov-group/caduceus/blob/c50dca8cdeddb1fb02e0246f26356f2f22739b1f/caduceus/modeling_caduceus.py#L537 hidden states have 2 * d_model channels for RCPS

But here, https://github.com/kuleshov-group/caduceus/blob/c50dca8cdeddb1fb02e0246f26356f2f22739b1f/caduceus/modeling_caduceus.py#L546 it just stack the first 1/4 of channels and the remaining 3/4, leading to an error as follows "RuntimeError: stack expects each tensor to be equal size, but got [1, 131072, 128] at entry 0 and [1, 131072, 384] at entry 1"

yair-schiff commented 3 months ago

Is it possible that you override the model config after you initialize it to have hidden dim 768?

qiaoqiaoLF commented 3 months ago

Is it possible that you override the model config after you initialize it to have hidden dim 768?

Hi, yair-schiff. I believe I can reproduce the error using the following code:

import torch
# Load model directly
from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained("kuleshov-group/caduceus-ps_seqlen-131k_d_model-256_n_layer-16", trust_remote_code=True)
input_ids = torch.randint(0, 6, (1, 131072), dtype=torch.long)
input_ids = input_ids.cuda()
model = model.cuda()
outputs = model(input_ids)

yair-schiff commented 3 months ago

Thanks! very helpful. I will take a look at this and keep you posted here.

yair-schiff commented 3 months ago

Thank you for catching this bug! I pushed a fix to the repo as well as to the HF models.