nytud / hunlp-GATE

Lang_Hungarian - a GATE plugin containing Hungarian NLP tools as GATE processing resources
GNU General Public License v3.0
8 stars 6 forks source link

GATE server gets slower after each request if using GATE 8.4 #24

Closed DavidNemeskey closed 6 years ago

DavidNemeskey commented 6 years ago

TL;DR

It turns out GATE 8.4 is buggy. Don't use it! This issue doesn't exist under 8.2.


Description

When run via gate-server.sh, hunlp-GATE (the latest version) gets slower after each request. This is the output of the test script I attached:

Request took 5.4847941398620605 seconds
Request took 4.873182058334351 seconds
Request took 5.728262662887573 seconds
Request took 6.841616630554199 seconds
Request took 7.380516529083252 seconds
Request took 8.534830570220947 seconds
Request took 10.321122407913208 seconds
Request took 10.715779781341553 seconds
Request took 11.847087144851685 seconds
Request took 13.018990993499756 seconds
Request took 14.757092237472534 seconds
Request took 15.631134510040283 seconds
Request took 16.861289501190186 seconds
Request took 18.667402505874634 seconds
Request took 19.60541081428528 seconds
Request took 21.03959584236145 seconds
Request took 22.58664298057556 seconds
Request took 24.56404757499695 seconds
Request took 25.602465391159058 seconds

If I stop the script and start it again, the number of seconds needed to complete the request do not reset:

Request took 29.1933650970459 seconds
Request took 30.444714784622192 seconds

Clearly, something is leaking in the server. However, it is not the components I use; I logged the time need for them to do their thing, and it is the about the same at each iteration. So it must be something in GATE, and after the pipeline has run, because that's where the lag is.

Environment: GATE: 8.4.1 (latest) hunlp-GATE: latest JRE: OpenJDK Runtime Environment (build 1.8.0_151-8u151-b12-0ubuntu0.16.04.2-b12)

The issue is not reproducible with GATE 8.2.

test_server.txt

temprimus commented 6 years ago

They replaced DocumentXmlUtils.toXml() with DocumentStaxUtils.toXml() for xml serialization which not only seems to take twice as long at the first run, but also gets slower and slower over time. Replaced the call to use the old version which seems to have solved the issue.