Open jeetendraabvv opened 1 year ago
I followed below article to finetune mbart model for Hindi language summarization https://ohmeow.github.io/blurr/text.modeling.seq2seq.summarization.html
For which i changed the language parameter "en_XX" to "hi_IN"in the following code.
if hf_arch == "mbart": text_gen_kwargs["decoder_start_token_id"] = hf_tokenizer.get_vocab()["hi_IN"]
tok_kwargs = {} if hf_arch == "mbart": tok_kwargs["src_lang"], tok_kwargs["tgt_lang"] = "hi_IN", "hi_IN"
But i am getting the following error when run the command: dls = dblock.dataloaders(df_train, bs=2)
TypeError: PreTrainedTokenizerFast._batch_encode_plus() got an unexpected keyword argument 'src_lang'
I am beginner.pl suggest solution of the above problem.
I followed below article to finetune mbart model for Hindi language summarization https://ohmeow.github.io/blurr/text.modeling.seq2seq.summarization.html
For which i changed the language parameter "en_XX" to "hi_IN"in the following code.
if hf_arch == "mbart": text_gen_kwargs["decoder_start_token_id"] = hf_tokenizer.get_vocab()["hi_IN"]
tok_kwargs = {} if hf_arch == "mbart": tok_kwargs["src_lang"], tok_kwargs["tgt_lang"] = "hi_IN", "hi_IN"
But i am getting the following error when run the command: dls = dblock.dataloaders(df_train, bs=2)
TypeError: PreTrainedTokenizerFast._batch_encode_plus() got an unexpected keyword argument 'src_lang'
I am beginner.pl suggest solution of the above problem.