Different Encoder Models Consistently Gives Errors Related to Decoders

Kosei1227 commented 2 months ago

Hi, Thank you for your assistance so far. I greatly appreciate your help with this research project.

Our research team is thinking about using different encoder-decoder models, called castorini/afriteva_v2_large and run the following training script.


source ~/miniconda3/etc/profile.d/conda.sh

conda activate langbridge

export CUDA_VISIBLE_DEVICES=1,3
NUM_GPU=2

# bash scripts/train_lb/metamath_afriteva.sh

ARGS="
--n_gpu $NUM_GPU
--strategy deepspeed_stage_2
--output_dir checkpoints/metamath-lb-9b
--run_name metamath-lb-afriteva
--seed 42
--train_set_path DKYoon/metamath-200k
--output_exists True
--enc_name_or_path castorini/afriteva_v2_large
--lm_name_or_path meta-math/MetaMath-7B-V1.0
--alignments linear
--enc_hidden_size 1024
--lm_hidden_size 4096
--max_length 128
--max_length_enc 1024
--freeze_language_model True
--freeze_encoder True
--learning_rate_alignment 6e-4
--learning_rate_enc 2e-5
--w_decay_alignment 0.0
--w_decay_enc 0.1
--warmup_steps 0
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--gradient_accumulation_steps 16
--logging_steps 10
--num_train_epochs 1
--dataloader_num_workers 16
--bf16 True
"

echo $ARGS
if [ $NUM_GPU == 1 ]; then
    echo "running on a single GPU"
    python train_langbridge.py $ARGS
else
    echo "running on multiple GPUs"
    torchrun --nproc_per_node $NUM_GPU train_langbridge.py $ARGS
fi

But, we got the following error. ''' raise ValueError(f"You have to specify either {err_msg_prefix}input_ids or {err_msg_prefix}inputs_embeds") ValueError: You have to specify either decoder_input_ids or decoder_inputs_embeds ''' The exact same error is observed with google/t5-large-lm-adapt and other models. I think that in the process of settings DKYoon/mt5-large-lm-adapt, you somehow might disable(?) decoders.

Could you share some tips/instructions to use different encoder-decoder models?

Thank you so much

MattYoon commented 2 months ago

Hey,

please check this line in modeling_langbridge.py https://github.com/kaistAI/LangBridge/blob/170e00d8ca90eb4f2e033a91461be582b6f34651/langbridge/modeling_langbridge.py#L51

The code explicitly instantiates the MT5EncoderModel class from HF, since there's no auto classes for EncoderModel (unlike how there's AutoModelForCausalLM etc.).

I would try modifying that part to T5EncoderModel.

Seems like the error is happening because the "else" cause is triggered in the code I shared to you, where it just instantiates the AutoModel class which loads the whole model, including the decoder.

Kosei1227 commented 1 month ago

Thank you so much! After additional debugging, I could gain excellent results with our own encoder model. I appreciate your helps.

kaistAI / LangBridge

Different Encoder Models Consistently Gives Errors Related to Decoders #16