the-qa-company / qEndpoint

A highly scalable RDF triple store with full-text and GeoSPARQL support
https://the-qa-company.com/products/qEndpoint
Other
79 stars 9 forks source link

Java heap space with big queries #349

Open ate47 opened 1 year ago

ate47 commented 1 year ago

Part of the endpoint? (leave empty if you don't know)

Description of the issue

I was running the query

SELECT *
WHERE {
  ?x1 ((<http://www.wikidata.org/prop/direct/P31>|<http://www.wikidata.org/prop/direct/P279>))+ ?x1
} LIMIT 1000000

and the system returned me that

2023-04-27T13:49:31.983+02:00  INFO 9404 --- [nio-1234-exec-1] c.t.qendpoint.compiler.SparqlRepository  : Running given sparql query: SELECT *
WHERE {
  ?x1 ((<http://www.wikidata.org/prop/direct/P31>|<http://www.wikidata.org/prop/direct/P279>))+ ?x1
} LIMIT 1000000
avr. 27, 2023 1:50:22 PM org.apache.catalina.core.StandardWrapperValve invoke
SEVERE: Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Handler dispatch failed: java.lang.OutOfMemoryError: Java heap space] with root cause
java.lang.OutOfMemoryError: Java heap space
        at java.base/java.util.Arrays.copyOf(Arrays.java:3481)
        at java.base/java.util.ArrayDeque.grow(ArrayDeque.java:149)
        at java.base/java.util.ArrayDeque.addLast(ArrayDeque.java:307)
        at java.base/java.util.ArrayDeque.add(ArrayDeque.java:494)
        at org.eclipse.rdf4j.query.algebra.evaluation.iterator.PathIteration.addToQueue(PathIteration.java:220)
        at org.eclipse.rdf4j.query.algebra.evaluation.iterator.PathIteration.getNextElement(PathIteration.java:177)
        at org.eclipse.rdf4j.query.algebra.evaluation.iterator.PathIteration.getNextElement(PathIteration.java:32)
        at org.eclipse.rdf4j.common.iteration.LookAheadIteration.lookAhead(LookAheadIteration.java:80)
        at org.eclipse.rdf4j.common.iteration.LookAheadIteration.hasNext(LookAheadIteration.java:54)
        at org.eclipse.rdf4j.common.iteration.IterationWrapper.hasNext(IterationWrapper.java:67)
        at org.eclipse.rdf4j.common.iteration.FilterIteration.findNextElement(FilterIteration.java:78)
        at org.eclipse.rdf4j.common.iteration.FilterIteration.hasNext(FilterIteration.java:49)
        at org.eclipse.rdf4j.common.iteration.ConvertingIteration.hasNext(ConvertingIteration.java:66)
        at org.eclipse.rdf4j.common.iteration.IterationWrapper.hasNext(IterationWrapper.java:67)
        at org.eclipse.rdf4j.common.iteration.LimitIteration.hasNext(LimitIteration.java:71)
        at org.eclipse.rdf4j.common.iteration.IterationWrapper.hasNext(IterationWrapper.java:67)
        at org.eclipse.rdf4j.sail.helpers.SailBaseIteration.hasNext(SailBaseIteration.java:43)
        at org.eclipse.rdf4j.sail.helpers.CleanerIteration.hasNext(CleanerIteration.java:42)
        at org.eclipse.rdf4j.common.iteration.IterationWrapper.hasNext(IterationWrapper.java:67)
        at org.eclipse.rdf4j.query.QueryResults.report(QueryResults.java:301)
        at org.eclipse.rdf4j.repository.sail.SailTupleQuery.evaluate(SailTupleQuery.java:75)
        at com.the_qa_company.qendpoint.compiler.SparqlRepository.execute0(SparqlRepository.java:500)
        at com.the_qa_company.qendpoint.compiler.SparqlRepository.execute(SparqlRepository.java:183)
        at com.the_qa_company.qendpoint.compiler.SparqlRepository.execute(SparqlRepository.java:162)
        at com.the_qa_company.qendpoint.controller.Sparql.execute(Sparql.java:499)
        at com.the_qa_company.qendpoint.controller.EndpointController.sparqlEndpoint(EndpointController.java:58)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:568)
        at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:207)
        at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:152)

Excepted behavior

error or the result

Obtained behavior

oom

How to reproduce

No response

Endpoint version

1.12.0

Do I want to contribute to fix it?

Maybe

Something else?

No response

D063520 commented 1 year ago

I thought sometimes about this. How can one limit the memory consumed by one query. But I never found a really good way. Maybe one could do something like for the timeout. We check in this function how big is the result set that was already generated .... really not sure if this is best solution and if it is possible.

hmottestad commented 9 months ago

This is probably specifically related to PathIteration. That code should probably detect when it's running low on memory and switch to a disk based data structure for storing the intermediary results.

JervenBolleman commented 9 months ago

Yes, the selected collection factory probably does not have a disk basked queue available. Discussing a potential solution at https://github.com/eclipse-rdf4j/rdf4j/issues/4899.

JervenBolleman commented 9 months ago

Coded something up in https://github.com/eclipse-rdf4j/rdf4j/pull/4902. Would you have a look and see if that works for qEndpoint. I am especially happy to hear what kind of switch over point in size we should select.