ohmeow / blurr

A library that integrates huggingface transformers with the world of fastai, giving fastai devs everything they need to train, evaluate, and deploy transformer specific models.
https://ohmeow.github.io/blurr
Apache License 2.0
289 stars 34 forks source link

Error due to tok_kwargs setting for Hindi Language #92

Open jeetendraabvv opened 1 year ago

jeetendraabvv commented 1 year ago

I followed below article to finetune mbart model for Hindi language summarization https://ohmeow.github.io/blurr/text.modeling.seq2seq.summarization.html

For which i changed the language parameter "en_XX" to "hi_IN"in the following code.

if hf_arch == "mbart": text_gen_kwargs["decoder_start_token_id"] = hf_tokenizer.get_vocab()["hi_IN"]

tok_kwargs = {} if hf_arch == "mbart": tok_kwargs["src_lang"], tok_kwargs["tgt_lang"] = "hi_IN", "hi_IN"

But i am getting the following error when run the command: dls = dblock.dataloaders(df_train, bs=2)

TypeError: PreTrainedTokenizerFast._batch_encode_plus() got an unexpected keyword argument 'src_lang'

I am beginner.pl suggest solution of the above problem.