mff-uk / odcs

ODCleanStore
1 stars 11 forks source link

not stable pipeline execution #1050

Closed kukharm closed 10 years ago

kukharm commented 10 years ago

i tested on commit ee5fdc9943cc42b5b7db5be6ecd1257ae9c69e13 Author: Jan Vojt 2014-01-06 00:22:49

and checked the part of scenario from https://grips.semantic-web.at/display/LOD2/2013-11-17+Scheduler+testing

I encountered with the next problems:

cz.cuni.mff.xrg.odcs.rdf.exceptions.InvalidQueryException: This query is probably not valid. : SPARQL execute failed:[construct  {?s ?p ?o} where {?s ?p ?o} Limit 100] 
 Exception:virtuoso.jdbc4.VirtuosoException: SR172: Transaction deadlocked
    at cz.cuni.mff.xrg.odcs.rdf.repositories.BaseRDFRepo.executeConstructQuery(BaseRDFRepo.java:955)
    at cz.cuni.mff.xrg.odcs.transformer.SPARQL.SPARQLTransformer.execute(SPARQLTransformer.java:103)
    at cz.cuni.mff.xrg.odcs.backend.execution.dpu.Executor.executeInstance(Executor.java:226)
    at cz.cuni.mff.xrg.odcs.backend.execution.dpu.Executor.execute(Executor.java:362)
    at cz.cuni.mff.xrg.odcs.backend.execution.dpu.Executor.run(Executor.java:441)
    at java.lang.Thread.run(Thread.java:722)
Caused by: org.openrdf.query.QueryEvaluationException: : SPARQL execute failed:[construct  {?s ?p ?o} where {?s ?p ?o} Limit 100] 
 Exception:virtuoso.jdbc4.VirtuosoException: SR172: Transaction deadlocked
    at virtuoso.sesame2.driver.VirtuosoRepositoryConnection.executeSPARQLForGraphResult(Unknown Source)
    at virtuoso.sesame2.driver.VirtuosoRepositoryConnection$2.evaluate(Unknown Source)
    at cz.cuni.mff.xrg.odcs.rdf.repositories.BaseRDFRepo.executeConstructQuery(BaseRDFRepo.java:948)
    ... 5 more

I have not always been able to reproduce this problems

When i tested scenario

one execution was successful, another - failed. So maybe problem in the parallel running..

janvojt commented 10 years ago

sometimes did not change the status of finished execution. In the logs was "Pipeline finished" but the status was running

Do you have backend log for this execution?

some times the next execution of the same pipeline was failed with message

This is a problem with RDF. @tomesj any ideas?

tomesj commented 10 years ago

Sorry, I have no ideas about this.

kukharm commented 10 years ago

Unfortunately i don't have logs. I tried to reproduce it but I did not succeed. Earlier i got problem with unchanging status 2 times. If I will reproduce it one more time i will save the logs.

tomas-knap commented 10 years ago

Please reproduce again

kukharm commented 10 years ago

podarilo se mi reprodukovat nekonecny running status. Je tam java.lang.NullPointerException: null Cely log v pripade potreby muzu poslat mailem. Je prilis velky

2014-01-10 20:04:57,150 [pool-2-thread-2] WARN  exec:425 dpu: c.c.m.x.o.b.d.SQLDatabaseReconnectAspect - failureTolerant has caught exception
java.lang.NullPointerException: null
    at java.util.Date.getMillisOf(Date.java:956) ~[na:1.7.0_04]
    at java.util.Date.before(Date.java:915) ~[na:1.7.0_04]
    at cz.cuni.mff.xrg.odcs.commons.app.facade.ScheduleFacadeImpl.filterActiveRunAfter(ScheduleFacadeImpl.java:227) ~[classes/:na]
    at cz.cuni.mff.xrg.odcs.commons.app.facade.ScheduleFacadeImpl.executeFollowers(ScheduleFacadeImpl.java:202) ~[classes/:na]
    at sun.reflect.GeneratedMethodAccessor54.invoke(Unknown Source) ~[na:na]
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.7.0_04]
    at java.lang.reflect.Method.invoke(Method.java:601) ~[na:1.7.0_04]
    at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:319) ~[spring-aop-3.1.2.RELEASE.jar:3.1.2.RELEASE]
    at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183) [spring-aop-3.1.2.RELEASE.jar:3.1.2.RELEASE]
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150) [spring-aop-3.1.2.RELEASE.jar:3.1.2.RELEASE]
    at org.springframework.aop.aspectj.MethodInvocationProceedingJoinPoint.proceed(MethodInvocationProceedingJoinPoint.java:80) ~[spring-aop-3.1.2.RELEASE.jar:3.1.2.RELEASE]
    at cz.cuni.mff.xrg.odcs.backend.db.SQLDatabaseReconnectAspect.failureTolerant(SQLDatabaseReconnectAspect.java:103) ~[classes/:na]
    at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) ~[na:na]
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.7.0_04]
    at java.lang.reflect.Method.invoke(Method.java:601) ~[na:1.7.0_04]
    at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethodWithGivenArgs(AbstractAspectJAdvice.java:621) [spring-aop-3.1.2.RELEASE.jar:3.1.2.RELEASE]
    at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethod(AbstractAspectJAdvice.java:610) [spring-aop-3.1.2.RELEASE.jar:3.1.2.RELEASE]
    at org.springframework.aop.aspectj.AspectJAroundAdvice.invoke(AspectJAroundAdvice.java:65) [spring-aop-3.1.2.RELEASE.jar:3.1.2.RELEASE]
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172) [spring-aop-3.1.2.RELEASE.jar:3.1.2.RELEASE]
    at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:90) [spring-aop-3.1.2.RELEASE.jar:3.1.2.RELEASE]
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172) [spring-aop-3.1.2.RELEASE.jar:3.1.2.RELEASE]
    at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:202) [spring-aop-3.1.2.RELEASE.jar:3.1.2.RELEASE]
    at $Proxy30.executeFollowers(Unknown Source) [na:na]
    at cz.cuni.mff.xrg.odcs.backend.scheduling.Scheduler.onPipelineFinished(Scheduler.java:58) [classes/:na]
    at cz.cuni.mff.xrg.odcs.backend.scheduling.Scheduler.onApplicationEvent(Scheduler.java:99) [classes/:na]
    at org.springframework.context.event.SimpleApplicationEventMulticaster.multicastEvent(SimpleApplicationEventMulticaster.java:97) [spring-context-3.1.2.RELEASE.jar:3.1.2.RELEASE]
    at org.springframework.context.support.AbstractApplicationContext.publishEvent(AbstractApplicationContext.java:324) [spring-context-3.1.2.RELEASE.jar:3.1.2.RELEASE]
    at cz.cuni.mff.xrg.odcs.backend.execution.pipeline.Executor.run(Executor.java:433) [classes/:na]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_04]
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) [na:1.7.0_04]
    at java.util.concurrent.FutureTask.run(FutureTask.java:166) [na:1.7.0_04]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) [na:1.7.0_04]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) [na:1.7.0_04]
    at java.lang.Thread.run(Thread.java:722) [na:1.7.0_04]
2014-01-10 20:04:57,151 [pool-2-thread-2] WARN  exec:425 dpu: c.c.m.x.o.b.d.SQLDatabaseReconnectAspect - Database is down after 195 attempts.
janvojt commented 10 years ago

I fixed the NPE. Transaction deadlock happened in RDF db during SPARQL execution. I have no clue how transactions work here, what isolation level is configured, etc. -> reassigning to Jirka, I believe he knows how this works.

tomesj commented 10 years ago

I have no tools how to detect or solve deadlock. Method for re-using method (using reconnection) are added yet.

janvojt commented 10 years ago

Well, the question is why and how did the deadlock happen in the first place. From what I understand, each DPU uses a dedicated thread and a dedicated graph in Virtuoso, so how can there be a deadlock?

tomas-knap commented 10 years ago

The deadlock happened somewhere in the middle of the pipeline or when extracting data in the first DPU or loading from the last?

On Fri, Jan 10, 2014 at 10:11 PM, Jan Vojt notifications@github.com wrote:

Well, the question is why and how did the deadlock happened in the first place. From what I understand, each DPU uses a dedicated thread and a dedicated graph in Virtuoso, so how can there be a deadlock?

— Reply to this email directly or view it on GitHubhttps://github.com/mff-uk/ODCS/issues/1050#issuecomment-32066522 .

skodapetr commented 10 years ago

Can we reproduce the deadlock on demand? Does it appear for single pipeline? Is there something about transactions in documentation? I agree with Jan, we should try to find out what is the cause of this.

tomesj commented 10 years ago

I can not reproduced the deadlock now. It´s still actuall ?