Closed tamas-visy closed 1 month ago
PLMs seem to tokenize text input by truncating overly long sentences
https://github.com/tamas-visy/cs4nlp-plmrb/blob/5f22c5de952018fc075a57c98f76eff03ae5683c/src/models/language_model.py#L76
We should check if any of the sentences are close to or above the limit we use.
(Currently, that is 512 [tokens?]).
PLMs seem to tokenize text input by truncating overly long sentences
https://github.com/tamas-visy/cs4nlp-plmrb/blob/5f22c5de952018fc075a57c98f76eff03ae5683c/src/models/language_model.py#L76
We should check if any of the sentences are close to or above the limit we use.
(Currently, that is 512 [tokens?]).