Allow device specification in pipeline init

dtschleckser commented 7 months ago

This PR adds the map_location argument to the pipeline parameters and passes it to the model. This enables GPU processing if you pass an alternate torch device like cuda in.

Example use:

import spacy
from gliner_spacy.pipeline import GlinerSpacy

custom_spacy_config = {
    "gliner_model": "urchade/gliner_small-v1",
    "chunk_size": 384,
    "labels": ["people","company","punctuation"],
    "style": "ent",
    "map_location": "cuda",
}
nlp = spacy.blank("en")
nlp.add_pipe("gliner_spacy", config=custom_spacy_config)

text = "This is a text about Bill Gates and Microsoft." * 10000
doc = nlp(text)

for ent in doc.ents:
    print(ent.text, ent.label_)

The map_location defaults to CPU, so it won't use the GPU unless explicitly specified. I've tested this change locally.

Also fixed a small typo in the requirements.txt.

Thanks!

stephenleo commented 6 months ago

I can confirm this works. I used nlp.pipe to run on batches of texts on GPU

texts=<list of texts>
docs = list(nlp.pipe(texts, batch_size=128))

wjbmattingly commented 6 months ago

Thanks! I'll do some quick tests and merge this today.

wjbmattingly commented 6 months ago

Thanks for submitting this PR! I have merged.

theirstory / gliner-spacy

Allow device specification in pipeline init #9