>512 token-related crash fix

oliverguhr / german-sentiment-lib

An easy to use python package for deep learning-based german sentiment classification.

https://pypi.org/project/germansentiment/

MIT License

58 stars 7 forks source link

>512 token-related crash fix #8

Closed elektrobohemian closed 3 years ago

elektrobohemian commented 3 years ago

last two parameters added to _self.tokenizer.batch_encodeplus() in order to limit maximum number of tokens in case of long strings to prevent crashes with the following error message:

Token indices sequence length is longer than the specified maximum sequence length for this model (603 > 512). 
Running this sequence through the model will result in indexing errors

this fix increases the stability of the script for longer natural texts.

oliverguhr commented 3 years ago

Hi @elektrobohemian, thanks for the PR. I modifyed it a bit and published a new release 1.0.6.

Thanks, Oliver