urchade / GLiNER

Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts) @ NAACL 2024
https://arxiv.org/abs/2311.08526
Apache License 2.0
1.17k stars 97 forks source link

Gliner on CPU with multiple cores #155

Open vijayendra-g opened 1 month ago

vijayendra-g commented 1 month ago

I want to use Gliner on CPU . The medium model takes anywhere between 18- 20 minutes for extracting entities from given text. My question is,

  1. Does Gliner support multiple cores on a single cpu and multiple cpus ? Will there be improvement in performance ?
    1. Assuming the answer to question 1 is yes, If I were to increase the no of cores and no of cpus, Then what sort of time improvement can we expect. Has anyone tried doing this?
urchade commented 1 month ago

hi, 20 min a for a single text ?

vijayendra-g commented 1 month ago

yes, Its a small paragraph

urchade commented 1 month ago

this should take seconds

vijayendra-g commented 1 month ago

Can you please review my code

from gliner import GLiNER

model = GLiNER.from_pretrained("urchade/gliner_medium-v2.1")

start = time.time()
text = """
The history of the football is as rich and varied as the game itself. Ancient civilizations, including the Chinese, Greeks, and Romans, played early forms of football. These rudimentary games involved kicking a ball, albeit with different rules and objectives. The modern football, as we know it, took shape in the 19th century, primarily in England. The establishment of standardized rules by the Football Association in 1863 marked a significant milestone, paving the way for the football to become a global icon."""

labels = [ "year", "country", "features"]

entities = model.predict_entities(text, labels)

for entity in entities:
    print(entity["text"], "=>", entity["label"])

end = time.time()
print(end - start) # time in seconds

In my case

Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 43 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 128 On-line CPU(s) list: 0-127 Vendor ID: AuthenticAMD Model name: AMD EPYC 7742 64-Core Processor CPU family: 23 Model: 49 Thread(s) per core: 1 Core(s) per socket: 64 Socket(s): 2 Stepping: 0 Frequency boost: enabled CPU max MHz: 2250.0000 CPU min MHz: 1500.0000 BogoMIPS: 4491.55

BeanWei commented 1 month ago

I'm also trying to run in a CPU environment. I tried the sample code you provided, and it showed results in about 0.8 seconds. However, my model is loaded manually by specifying the directory, so is it faster? By the way, from what I've observed, running on CPU seems to be very resource-intensive. When I run predict_entities multiple times in my local environment, the CPU usage stays consistently at 100%, with no progress so far. So I have a similar question about running Gliner in a resource-constrained environment.

vijayendra-g commented 1 month ago

@BeanWei If I understand you correctly, is this correct

****model = GLiNER.from_pretrained("/home/.../gliner_medium-v2")**** I did the changes , It's still running in minutes

from gliner import GLiNER
**model = GLiNER.from_pretrained("/home/.../gliner_medium-v2")**
start = time.time()
text = """
The history of the football is as rich and varied as the game itself. Ancient civilizations, including the Chinese, Greeks, and Romans, played early forms of football. These rudimentary games involved kicking a ball, albeit with different rules and objectives. The modern football, as we know it, took shape in the 19th century, primarily in England. The establishment of standardized rules by the Football Association in 1863 marked a significant milestone, paving the way for the football to become a global icon."""

labels = [ "year", "country", "features"]

entities = model.predict_entities(text, labels)

for entity in entities:
    print(entity["text"], "=>", entity["label"])

end = time.time()
print(end - start) # time in seconds
polodealvarado commented 2 weeks ago

Hello @vijayendra-g .

I tried to replicate your results and I got it in minutes. On the other hand, I tried to use it with onnx but It takes more time than using just torch ( two times slower)

Has anyone tested it with onnx ?

polodealvarado commented 2 weeks ago

I fixed the code and now it works for me. I got 30% of improvement in the speed.

vijayendra-g commented 2 weeks ago

@polodealvarado what is the code fix ? How much time does Gliner - medium take now? Please specify Gliner version as well .

psydok commented 5 days ago

@polodealvarado @vijayendra-g I have encountered the same problem. Onnx runs 2 times longer than the normal model. It was written that the fixes thread has the needed fix, but that doesn't seem to be it. Can you tell me what kind of fixes are needed? How did you find the bug? I don't often work with onnx, I don't understand what could be wrong with the current code.

The quantized model sags a lot in quality. it does not seem to be the sag that might have been expected.

polodealvarado commented 5 days ago

@psydok @vijayendra-g My problem arose with the sequence length. I realized that with a sequence longer than 512 tokens the onnx model takes a lot of time. So, I have just shortened it.

However, as you said @psydok there is a significant degradation in the model’s performance with the quantized versions (up to 100% in some cases).

psydok commented 5 days ago

@polodealvarado Thank you for your answer! I conducted experiment: I limited the sequence to 512. The response time of the model actually got better, but the results are about the same, what with onnx, what without onnx... I checked it like this (labels=["person", "location"]):

# model = GLiNER.from_pretrained("my_models/gliner_multi", load_onnx_model=True, load_tokenizer=True)
%%timeit
entities = model.predict_entities(text[:512], labels, threshold=0.25)

onnx: 79.4 ms without onnx: 72.9 ms

%%timeit
entities = model.predict_entities(text[:384], labels, threshold=0.25)

onnx (opset_version=14): 69.1 ms without onnx: 65.4 ms

That is, there is still no increase in speed... I converted the model as in the guide of this repository: https://github.com/urchade/GLiNER/blob/main/examples/convert_to_onnx.ipynb

urchade commented 5 days ago

text[:512]assume 512 characters not tokens

you can change the maximum size by settingmodel.config.max_len = 512

psydok commented 5 days ago

The default value is model.config.max_len #= 384 I tried more models from onnx-community. Either the quality is terrible, or the original model is already there... https://huggingface.co/onnx-community/gliner_multi