jxmorris12 / language_tool_python

a free python grammar checker 📝✅
GNU General Public License v3.0
429 stars 61 forks source link

Is there any solution to speed up the grammar checking/correction? #50

Closed phosseini closed 2 years ago

phosseini commented 2 years ago

I wonder if there's any solution/way to make this library a bit faster or speed up the running? The grammar correction, even though pretty useful, becomes a bottleneck when there are too many records (e.g., >10k)

jxmorris12 commented 2 years ago

@phosseini Yeah, sure, this is a good idea. There are basically two dimensions I can think of for doing this:

  1. More threads in LanguageToolPython process

The LanguageTool server is running behind the scenes. We can provide it an argument --config.maxCheckThreads to set the maximum number of concurrent threads in the process. I think the default is 10, and increasing that number would probably help in your case (up to a point). Either way it would ideally be configurable (along with the other configuration settings too!).

  1. More LanguageToolPython processes

We could also scale horizontally by allowing users to run multiple servers concurrently and distributing a list of inputs to check amongst the running servers. (This is directly related to feature request #40, which wanted to connect multiple clients to a single server. You would want a single client and multiple servers, but the same abstraction.)

sboshin commented 2 years ago

I just went through this process recently. And enabling caching is very beneficial. To enable caching the only way I currently know of is to pass in a config file, and enable it from there. To do this language_tool_python would need to support passing in a config file.

Ill try to get a PR out soon to support this.

jxmorris12 commented 2 years ago

This should be fixed in the latest version, 2.7.0!! 🥳 see the readme for information, but you can basically pass a dictionary of configuration options to the LanguageTool object now.