tema16 / tt4j

Automatically exported from code.google.com/p/tt4j
0 stars 0 forks source link

threads overhead #15

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
perform many tagging operations

each call to process recreates the input+output+error threads.
since the tagging process itself is kept, it is much more efficient to also 
keep the input/output threads for the process. I've tested tt4j on multiple 
cores (created an instance per core) and there doesnt seem to be significant 
improvment in speed, to me it seems it is a result of creating/destroying all 
those threads.

Original issue reported on code.google.com by had...@gmail.com on 9 Oct 2012 at 2:10

GoogleCodeExporter commented 9 years ago
The threads are created for each batch of input data sent to the tagger. I 
usually send whole documents to tt4j, so while there is some overhead for the 
threads, its not a killer. 

It would indeed be better to keep the threads hanging around or possibly to 
work completely without threads. Unfortunately, I currently do not the 
resources to do the necessary refactoring. Getting such things to a point they 
work tends to be tricky.

I suggest that you might try sending larger batches of text to the tagger, 
possibly not sentence per sentence if possible.

Original comment by richard.eckart on 9 Oct 2012 at 5:32