Request Timeout when running crossvalidation

alphaville commented 13 years ago

Occasionally, a Request Timeout exception is thrown (see for example http://toxcreate2.in-silico.ch/task/4332 ). I have the feeling that this happens under heavy load on the server. I've checked the response times at opentox.ntua.gr:8080 and they remain low (see http://ambit.uni-plovdiv.bg/cgi-bin/smokeping.cgi?target=NTUA and http://opentox.ntua.gr:8080/monitoring).

mguetlein commented 13 years ago

Pantelis, this timeout is thrown at your service. You should check your web-server error and access logs.

alphaville commented 13 years ago

I think this is of some interest: http://opentox.ntua.gr/index.php/blog/76-rdf-opentox-discussion?showall=1&limitstart= - Especially the last paragraph about RDF vs ARFF. It explains the timeout. Could you provide ARFF along with RDF for datasets?

mguetlein commented 13 years ago

the timeout in this task (http://toxcreate2.in-silico.ch/task/4332) happens during a simple get request to your service:

 rest_params: 
    :headers: 
      :accept: application/rdf+xml
      :subjectid: AQIC5wM2LY4SfcyXpalQEtoyjxZzHhZIMARV18Unjdb27k8=@AAJTSQACMDE=#
    :payload: 
    :rest_uri: http://opentox.ntua.gr:8080/model/9b84be8c-87d6-4405-aad8-bd7cfc81251e

so, this should have nothing to do with rdf parsing.

not sure if we will provide arff. It should be not too much of an effort, but I think christoph and nina planned to replace rdf owl-dl with a fixed datamodel, represented for example in jason.

alphaville commented 13 years ago

Did you read the blog? RDF parsing consumes 2.79GB compared to 1MB for ARFF. (see http://opentox.ntua.gr/index.php/blog/76-rdf-opentox-discussion?showall=1&limitstart= ). It consumes all the RAM of the server and it starts using swap and it responds too slowly.

mguetlein commented 13 years ago

Sorry, Pantelis I did not read the blog completely.

But the timeout occurs during a simple get request to your model. Do you parse a big rdf file when a get request to an existing model is performed?

mguetlein commented 13 years ago

One more thing Pantelis: I do think that rdf scalablity issue is a severe problem, and we have to solve it, and and it is good that do you this investigations. But IMHO this should still never cause timeouts during the model building process. This is what tasks are for. First the model building service return the task to the client. Then it starts proceeding the rdf data.

alphaville commented 13 years ago

Yes, that's right. First a task is created with status QUEUED. Up to that time nothing happens. After that, and unless there not more than 2 other tasks running on the system, the task is submitted to the execution pool. Then it starts downloading and parsing stuff and stuffs the memory with RDF triples... and then the system hangs and crawls ;) Any other running tasks hang too! Even the apache server is dead at that point. If you stand in front of the screen of this computer you're hardly able to move the mouse pointer. The reason is because the whole RAM is occupied and in some cases even half of the swap space!!! The same holds for GET on /task/id. Therefore, it is a matter of RDF scalability.

mguetlein commented 13 years ago

;-))) very nice description. I see. We should enforce the scalability issue on the mailing list...

vedina commented 13 years ago

the task is submitted to the execution pool. Then it starts downloading and parsing stuff and stuffs the memory with >RDF triples... and then the system hangs and crawls ;)

It seems like the downloading and parsing is done on one go. If so, will be less blocking, if the task is accepted and the task URI is returned immediately. Then the download starts and writes data into a file,and only upon completion is parsed into RDF.

alphaville commented 13 years ago

The task is returned immediately to the client. This is the first action, before downloading or parsing anything, a task is created which (if the server does not run lots of other jobs) is submitted for execution. The HTTP connection is closed immediately and the client does not need to wait for anything. No Timeouts are expected. Except if... the machine can't take it because some task running in the background consumes all resources.

vedina commented 13 years ago

Did some tests with this dataset public void readRDF() { Model jenaModel = ModelFactory.createOntologyModel(OntModelSpec.OWL_DL_MEM); long mem0 = Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory(); System.out.println("Memory used: " + mem0/1024 + " K bytes"); long now = System.currentTimeMillis();


    jenaModel.read("http://apps.ideaconsult.net:8080/ambit2/dataset/585036",null);
    long mem1 = Runtime.getRuntime().totalMemory() -  Runtime.getRuntime().freeMemory();
    System.out.println("Memory used for Jena object " + (mem1 - mem0)/(1024) + " K bytes");
    System.out.println("Dataset read in "+ (System.currentTimeMillis() - now) + " ms");
}

Printout from the code above, when using OWL model Model jenaModel = ModelFactory.createOntologyModel(OntModelSpec.OWL_DL_MEM); Memory used: 3622 K bytes Memory used for Jena object 245429 K bytes Dataset read in 144273 ms

Printout from the code above, when using non-OWL mode Model jenaModel = ModelFactory.createDefaultModel(); Memory used: 1358 K bytes Memory used for Jena object 243377 K bytes Dataset read in 108253 ms

At the worst case it is 245MB in memory, not in anyway close to 2.5 GB .

mguetlein / opentox-validation

Request Timeout when running crossvalidation #14