statsmaths / cleanNLP

R package providing annotators and a normalized data model for natural language processing
GNU Lesser General Public License v2.1
209 stars 36 forks source link

Multithreaded spaCy #43

Closed asitemade4u closed 4 years ago

asitemade4u commented 5 years ago

First, let me thank you for this comprehensive, outstanding package! Then, as I compared cleanNLP with Ken Benoit's spaCy package, I noticed his package allowed fine tuning of the multithreading of spaCy. As I have to treat quite a big corpus, I was wondering if that was also possible from cleanNLP.

statsmaths commented 4 years ago

Unfortunately this is no longer possible given updates to space. From here:

The keyword argument n_threads on the .pipe methods is now deprecated,
as the v2.x models cannot release the global interpreter lock. (Future versions
may introduce a n_process argument for parallel inference via multiprocessing.)

As soon as a future version includes this I will be happy to link to it through cleanNLP.