Open hardianlawi opened 3 years ago
I also tried running the spacy pipeline using GPU by adding the codes below, but it does not seem to give much boost.
import spacy
import torch
spacy.prefer_gpu()
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
The GPU did not improve any performance, I guess it's because the data still have to be processed by the spacy first.
If you initiate 2 nlp modules in spacy, one is normal modules and another one is a module that combined with the benepar, when you process sentences, the normal modules goes through all process and the benepar only uses parser module, this will give you roughly 2x processing speed.
nlp1 = spacy.load('en_core_web_md')
nlp2 = spacy.load('en_core_web_sm')
if spacy.__version__.startswith('2'):
nlp2.add_pipe(BeneparComponent("benepar_en3"))
else:
nlp2.add_pipe("benepar", config={"model": "benepar_en3"})
docs_nlp1 = list(nlp1.pipe(examples['sentence'], disable=["tok2vec"], n_process=4))
docs_nlp2 = list(nlp2.pipe(examples['sentence'], disable=["tok2vec","tagger","ner","lemmatizer","textcat"]))
If you use multi-thread to process it independently, I guess the speed will be further improved to 4x compared with the original speed. However, I do not know how to work spacy in multi-thread situations.
I'm trying to make the multi-processing spacy pipeline works with the berkeley parser as I assume it will boost the performance. How can I get it to work? I tried the suggestion from here, but it didn't work for me.
Error message