urchade / GLiNER

Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts) @ NAACL 2024
https://arxiv.org/abs/2311.08526
Apache License 2.0
1.47k stars 126 forks source link

Default max_len value #183

Open yishusong opened 2 months ago

yishusong commented 2 months ago

I'm trying to increase the size of the input texts, however, it seems like all large versions of Gliner (2, 2.1) have default max_len = 384. So I'm wondering what's the reasoning behind the value 384 and whether I can modify this value during inference.

Much appreciated!

Ingvarstep commented 2 months ago

In GLiNER max_len value refers not to tokens count, but to words count, 384 words are on average approximately equal to 512 tokens. Up to this range DeBERTA model - a backbone transformer of GLiNER works the best. You can increase the maximum length, but performance can start to degrade.

from gliner import GLiNER
import torch

model = GLiNER.from_pretrained("urchade/gliner_large-v2.1", max_length = 768).to('cuda:0', dtype=torch.float16)
GioPetro commented 2 months ago

Where can I have like a full documentation for the arguments for GLiNER ? max_length for example isn't in any arguments doc that I found, yet it works. So, my conclusion is that I neeed that docs to search on them for modifications. Thanks