Closed ErwanColombel92 closed 7 months ago
It is limited to 384 words (around 512 subtokens)
You have to chunk your text, as the pretrained model I used (deberta) has limited context length
Ok thanks ! It would be great if it had a warning instead of just troncating the text while saying nothing if you want to improve the function !
Have a great day
Thanks for the suggestion. I have added a warning in the newer version!
It is limited to 384 words (around 512 subtokens) deberta
1) I'd like to help if possible to pretrain with a model with higher limit ... is it possible ?
2) also, I'd like to public one specific for legal sector and spanish, tell me how to do it (i have all datasets necessary) and I can publish it ! I've seen the fine-tuning notebook ... but i'd like to train from a bigger model than "urchade/gliner_multi-v2.1", since I think that one is like "medium", right ?
Hi, like said in the title, i was wondering what was the limit in terms of caracters (or tokens ?). Because i never had any warning while putting big portion of texts, but i can see that not everything is taken into account...
Thanks !