Closed kukharm closed 10 years ago
sometimes did not change the status of finished execution. In the logs was "Pipeline finished" but the status was running
Do you have backend log for this execution?
some times the next execution of the same pipeline was failed with message
This is a problem with RDF. @tomesj any ideas?
Sorry, I have no ideas about this.
Unfortunately i don't have logs. I tried to reproduce it but I did not succeed. Earlier i got problem with unchanging status 2 times. If I will reproduce it one more time i will save the logs.
Please reproduce again
podarilo se mi reprodukovat nekonecny running status. Je tam java.lang.NullPointerException: null Cely log v pripade potreby muzu poslat mailem. Je prilis velky
2014-01-10 20:04:57,150 [pool-2-thread-2] WARN exec:425 dpu: c.c.m.x.o.b.d.SQLDatabaseReconnectAspect - failureTolerant has caught exception
java.lang.NullPointerException: null
at java.util.Date.getMillisOf(Date.java:956) ~[na:1.7.0_04]
at java.util.Date.before(Date.java:915) ~[na:1.7.0_04]
at cz.cuni.mff.xrg.odcs.commons.app.facade.ScheduleFacadeImpl.filterActiveRunAfter(ScheduleFacadeImpl.java:227) ~[classes/:na]
at cz.cuni.mff.xrg.odcs.commons.app.facade.ScheduleFacadeImpl.executeFollowers(ScheduleFacadeImpl.java:202) ~[classes/:na]
at sun.reflect.GeneratedMethodAccessor54.invoke(Unknown Source) ~[na:na]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.7.0_04]
at java.lang.reflect.Method.invoke(Method.java:601) ~[na:1.7.0_04]
at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:319) ~[spring-aop-3.1.2.RELEASE.jar:3.1.2.RELEASE]
at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183) [spring-aop-3.1.2.RELEASE.jar:3.1.2.RELEASE]
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150) [spring-aop-3.1.2.RELEASE.jar:3.1.2.RELEASE]
at org.springframework.aop.aspectj.MethodInvocationProceedingJoinPoint.proceed(MethodInvocationProceedingJoinPoint.java:80) ~[spring-aop-3.1.2.RELEASE.jar:3.1.2.RELEASE]
at cz.cuni.mff.xrg.odcs.backend.db.SQLDatabaseReconnectAspect.failureTolerant(SQLDatabaseReconnectAspect.java:103) ~[classes/:na]
at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) ~[na:na]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.7.0_04]
at java.lang.reflect.Method.invoke(Method.java:601) ~[na:1.7.0_04]
at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethodWithGivenArgs(AbstractAspectJAdvice.java:621) [spring-aop-3.1.2.RELEASE.jar:3.1.2.RELEASE]
at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethod(AbstractAspectJAdvice.java:610) [spring-aop-3.1.2.RELEASE.jar:3.1.2.RELEASE]
at org.springframework.aop.aspectj.AspectJAroundAdvice.invoke(AspectJAroundAdvice.java:65) [spring-aop-3.1.2.RELEASE.jar:3.1.2.RELEASE]
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172) [spring-aop-3.1.2.RELEASE.jar:3.1.2.RELEASE]
at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:90) [spring-aop-3.1.2.RELEASE.jar:3.1.2.RELEASE]
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172) [spring-aop-3.1.2.RELEASE.jar:3.1.2.RELEASE]
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:202) [spring-aop-3.1.2.RELEASE.jar:3.1.2.RELEASE]
at $Proxy30.executeFollowers(Unknown Source) [na:na]
at cz.cuni.mff.xrg.odcs.backend.scheduling.Scheduler.onPipelineFinished(Scheduler.java:58) [classes/:na]
at cz.cuni.mff.xrg.odcs.backend.scheduling.Scheduler.onApplicationEvent(Scheduler.java:99) [classes/:na]
at org.springframework.context.event.SimpleApplicationEventMulticaster.multicastEvent(SimpleApplicationEventMulticaster.java:97) [spring-context-3.1.2.RELEASE.jar:3.1.2.RELEASE]
at org.springframework.context.support.AbstractApplicationContext.publishEvent(AbstractApplicationContext.java:324) [spring-context-3.1.2.RELEASE.jar:3.1.2.RELEASE]
at cz.cuni.mff.xrg.odcs.backend.execution.pipeline.Executor.run(Executor.java:433) [classes/:na]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_04]
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) [na:1.7.0_04]
at java.util.concurrent.FutureTask.run(FutureTask.java:166) [na:1.7.0_04]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) [na:1.7.0_04]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) [na:1.7.0_04]
at java.lang.Thread.run(Thread.java:722) [na:1.7.0_04]
2014-01-10 20:04:57,151 [pool-2-thread-2] WARN exec:425 dpu: c.c.m.x.o.b.d.SQLDatabaseReconnectAspect - Database is down after 195 attempts.
I fixed the NPE. Transaction deadlock happened in RDF db during SPARQL execution. I have no clue how transactions work here, what isolation level is configured, etc. -> reassigning to Jirka, I believe he knows how this works.
I have no tools how to detect or solve deadlock. Method for re-using method (using reconnection) are added yet.
Well, the question is why and how did the deadlock happen in the first place. From what I understand, each DPU uses a dedicated thread and a dedicated graph in Virtuoso, so how can there be a deadlock?
The deadlock happened somewhere in the middle of the pipeline or when extracting data in the first DPU or loading from the last?
On Fri, Jan 10, 2014 at 10:11 PM, Jan Vojt notifications@github.com wrote:
Well, the question is why and how did the deadlock happened in the first place. From what I understand, each DPU uses a dedicated thread and a dedicated graph in Virtuoso, so how can there be a deadlock?
— Reply to this email directly or view it on GitHubhttps://github.com/mff-uk/ODCS/issues/1050#issuecomment-32066522 .
Can we reproduce the deadlock on demand? Does it appear for single pipeline? Is there something about transactions in documentation? I agree with Jan, we should try to find out what is the cause of this.
I can not reproduced the deadlock now. It´s still actuall ?
i tested on commit ee5fdc9943cc42b5b7db5be6ecd1257ae9c69e13 Author: Jan Vojt 2014-01-06 00:22:49
and checked the part of scenario from https://grips.semantic-web.at/display/LOD2/2013-11-17+Scheduler+testing
I encountered with the next problems:
I have not always been able to reproduce this problems
When i tested scenario
one execution was successful, another - failed. So maybe problem in the parallel running..