Training issue of a Transformer based Encoder-Decoder model based on pre-trained BanglaBERT

AbuUbaida commented 2 years ago

I just tried to train an EncoderDeoder model for summarization task based on pre-trained BanglaBERT, which is an ELECTRA discriminator model pre-trained with the Replaced Token Detection (RTD) objective. Surprisingly, after spending 4500 steps on 10k training data, the model wasn't trained a bit since the ROUGE-2 scores were just 0.0000. To make sure I used that 4500-checkpoint to generate summaries for testing purposes; it generated a fixed-length (50) output (even if I change the test input of different lengths) containing the [CLS] token 49 times and a [SEP] token lastly. Basically, I followed the Warm-starting encoder-decoder models with 🤗Transformers notebook. Can anybody give any clue what could be the issue here? Thanks in advance.

In my case,

Tokenization BanglaBERT model:

tokenizer = AutoTokenizer.from_pretrained("csebuetnlp/banglabert")
tokenizer.bos_token = tokenizer.cls_token
tokenizer.eos_token = tokenizer.sep_token

Input pre-processing function:

def process_data_to_model_inputs(batch):
    inputs = tokenizer(batch['text'], padding="max_length", truncation=True, max_length=encoder_max_length)
    outputs = tokenizer(batch['summary'], padding="max_length", truncation=True, max_length=decoder_max_length)

    batch["input_ids"] = inputs.input_ids
    batch["attention_mask"] = inputs.attention_mask
    batch["decoder_input_ids"] = outputs.input_ids
    batch["decoder_attention_mask"] = outputs.attention_mask
    batch["labels"] = outputs.input_ids.copy()

    batch["labels"] = [[-100 if token == tokenizer.pad_token_id else token for token in labels] for labels in batch["labels"]]
    return batch

Mapping the pre-processing function to the batches of examples:

train_data = train_data.map(
    process_data_to_model_inputs, 
    batched=True,
    batch_size=batch_size,
    remove_columns=["text", "summary"]
)
train_data.set_format(
    type="torch", columns=["input_ids", "attention_mask", "decoder_input_ids", "decoder_attention_mask", "labels"],
)

valid_data = valid_data.map(
    process_data_to_model_inputs, 
    batched=True, 
    batch_size=batch_size,
    remove_columns=["text", "summary"]
)
valid_data.set_format(
    type="torch", columns=["input_ids", "attention_mask", "decoder_input_ids", "decoder_attention_mask", "labels"],
)

BanglaBERT model and its config settings:

bert2bert = EncoderDecoderModel.from_encoder_decoder_pretrained("csebuetnlp/banglabert", "csebuetnlp/banglabert")
bert2bert.config.decoder_start_token_id = tokenizer.bos_token_id
bert2bert.config.eos_token_id = tokenizer.eos_token_id
bert2bert.config.pad_token_id = tokenizer.pad_token_id

bert2bert.config.vocab_size = bert2bert.config.decoder.vocab_size
bert2bert.config.max_length = 128
bert2bert.config.min_length = 42
bert2bert.config.early_stopping = True
bert2bert.config.length_penalty = 2.0
bert2bert.config.num_beams = 8
bert2bert.config.remove_invalid_values = True
bert2bert.config.repetition_penalty = 2.0
bert2bert.config.length_penalty = 2.0

I used the Seq2SeqTrainer for training. The Seq2SeqTrainingArguments were as follows:

    evaluation_strategy = "steps",
    per_device_train_batch_size = batch_size,
    per_device_eval_batch_size = batch_size,
    predict_with_generate = True,
    logging_steps = 1000, 
    save_steps = 500, 
    eval_steps = 5000, 
    warmup_steps = 500,
    overwrite_output_dir = True,
    save_total_limit = 2,
    num_train_epochs = 20,
    fp16 = True

AniketRajpoot commented 2 years ago

I am literally getting the same issue, invested hours trying to figure out what is wrong but still did not get it. I tried both BERTurk and mBERT and both of them gave the same issue and even training loss goes near 0 instantly I am not sure what to do.

AbuUbaida commented 2 years ago

Sorry for being late. Yes, that's because what approach you are following will work in the specific version more precisely, v4.2.1 but not in the current release (maybe 4.18).

AniketRajpoot commented 2 years ago

Sorry for being late. Yes, that's because what approach you are following will work in the specific version more precisely, v4.2.1 but not in the current release (maybe 4.18).

Hii sorry but I am not sure what exactly to change can you be more specific? Should I not use Encoder Decoder class or should I change the trainer ? I am new to this!

Thank you in advance.

RaedShabbir commented 2 years ago

I believe he means to ensure your version of transformers and other relevant libraries match the example

AbuUbaida commented 2 years ago

I believe he means to ensure your version of transformers and other relevant libraries match the example

@AniketRajpoot exactly this is and the adjustments needed for version 4.18 are in progress. I have opened another issue. you can keep your eyes if you want.

AniketRajpoot commented 2 years ago

Thank you so much @AbuUbaida @RaedShabbir, I understand the issue!

huggingface / transformers

Training issue of a Transformer based Encoder-Decoder model based on pre-trained BanglaBERT #17122