Closed tomcobley closed 1 year ago
It seems this was a more general question which has an explanation here (the choice of pad_token
doesn't actually matter since the attention mask causes padded indices to be ignored).
Adding the line
tokenizer.pad_token = tokenizer.eos_token
removed the error and solved the problem :).
Hi, thanks for sharing your model!
I am trying to use it to generate embeddings of batches of sequences of text of different lengths (Gene Ontology annotations). However, when I try to do this using
huggingface
, I get the following error at the tokenization stage.Code:
Error:
How should I resolve this?
Thanks!