last two parameters added to _self.tokenizer.batch_encodeplus() in order to limit maximum number of tokens in case of long strings to prevent crashes with the following error message:
Token indices sequence length is longer than the specified maximum sequence length for this model (603 > 512).
Running this sequence through the model will result in indexing errors
this fix increases the stability of the script for longer natural texts.
last two parameters added to _self.tokenizer.batch_encodeplus() in order to limit maximum number of tokens in case of long strings to prevent crashes with the following error message:
this fix increases the stability of the script for longer natural texts.