reckart / tt4j

TreeTagger for Java
http://reckart.github.io/tt4j/
Apache License 2.0
16 stars 7 forks source link

threads overhead #15

Open reckart opened 9 years ago

reckart commented 9 years ago

Original issue 15 created by reckart on 2012-10-09T14:10:48.000Z:

What steps will reproduce the problem? perform many tagging operations

each call to process recreates the input+output+error threads. since the tagging process itself is kept, it is much more efficient to also keep the input/output threads for the process. I've tested tt4j on multiple cores (created an instance per core) and there doesnt seem to be significant improvment in speed, to me it seems it is a result of creating/destroying all those threads.

reckart commented 9 years ago

Comment #1 originally posted by reckart on 2012-10-09T17:32:50.000Z:

The threads are created for each batch of input data sent to the tagger. I usually send whole documents to tt4j, so while there is some overhead for the threads, its not a killer.

It would indeed be better to keep the threads hanging around or possibly to work completely without threads. Unfortunately, I currently do not the resources to do the necessary refactoring. Getting such things to a point they work tends to be tricky.

I suggest that you might try sending larger batches of text to the tagger, possibly not sentence per sentence if possible.