I'm going to walk Fox.. I'm going to add a note here though (will make it into a card when I get back).
Basically what I've found is that, at least in some cases, not all samples are getting imported/linked with a cohort due to a db exception occurring.
stack trace in smile-server
2024-06-24 19:30:52.428 ERROR 1 --- [pool-6-thread-4] .m.s.s.i.TempoMessageHandlingServiceImpl : Error during handling of Cohort complete event
org.springframework.dao.IncorrectResultSizeDataAccessException: Incorrect result size: expected at most 1
at org.springframework.data.neo4j.repository.query.GraphQueryExecution$SingleEntityExecution.execute(GraphQueryExecution.java:73) ~[spring-data-neo4j-5.3.3.RELEASE.jar!/:5.3.3.RELEASE]
at org.springframework.data.neo4j.repository.query.GraphRepositoryQuery.doExecute(GraphRepositoryQuery.java:76) ~[spring-data-neo4j-5.3.3.RELEASE.jar!/:5.3.3.RELEASE]
at org.springframework.data.neo4j.repository.query.AbstractGraphRepositoryQuery.execute(AbstractGraphRepositoryQuery.java:57) ~[spring-data-neo4j-5.3.3.RELEASE.jar!/:5.3.3.RELEASE]
at org.springframework.data.repository.core.support.QueryExecutorMethodInterceptor$QueryMethodInvoker.invoke(QueryExecutorMethodInterceptor.java:195) ~[spring-data-commons-2.3.3.RELEASE.jar!/:2.3.3.RELEASE]
at org.springframework.data.repository.core.support.QueryExecutorMethodInterceptor.doInvoke(QueryExecutorMethodInterceptor.java:152) ~[spring-data-commons-2.3.3.RELEASE.jar!/:2.3.3.RELEASE]
at org.springframework.data.repository.core.support.QueryExecutorMethodInterceptor.invoke(QueryExecutorMethodInterceptor.java:130) ~[spring-data-commons-2.3.3.RELEASE.jar!/:2.3.3.RELEASE]
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) ~[spring-aop-5.2.8.RELEASE.jar!/:5.2.8.RELEASE]
at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:367) ~[spring-tx-5.2.8.RELEASE.jar!/:5.2.8.RELEASE]
at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:118) ~[spring-tx-5.2.8.RELEASE.jar!/:5.2.8.RELEASE]
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) ~[spring-aop-5.2.8.RELEASE.jar!/:5.2.8.RELEASE]
at org.springframework.dao.support.PersistenceExceptionTranslationInterceptor.invoke(PersistenceExceptionTranslationInterceptor.java:139) ~[spring-tx-5.2.8.RELEASE.jar!/:5.2.8.RELEASE]
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) ~[spring-aop-5.2.8.RELEASE.jar!/:5.2.8.RELEASE]
at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:95) ~[spring-aop-5.2.8.RELEASE.jar!/:5.2.8.RELEASE]
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) ~[spring-aop-5.2.8.RELEASE.jar!/:5.2.8.RELEASE]
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:212) ~[spring-aop-5.2.8.RELEASE.jar!/:5.2.8.RELEASE]
at com.sun.proxy.$Proxy113.findTempoBySamplePrimaryId(Unknown Source) ~[na:na]
at org.mskcc.smile.service.impl.TempoServiceImpl.getTempoDataBySamplePrimaryId(TempoServiceImpl.java:68) ~[service-0.1.0.jar!/:0.1.0]
at org.mskcc.smile.service.impl.TempoServiceImpl$$FastClassBySpringCGLIB$$bee97256.invoke(<generated>) ~[service-0.1.0.jar!/:0.1.0]
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218) ~[spring-core-5.2.8.RELEASE.jar!/:5.2.8.RELEASE]
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:687) ~[spring-aop-5.2.8.RELEASE.jar!/:5.2.8.RELEASE]
at org.mskcc.smile.service.impl.TempoServiceImpl$$EnhancerBySpringCGLIB$$c293d4aa.getTempoDataBySamplePrimaryId(<generated>) ~[service-0.1.0.jar!/:0.1.0]
at org.mskcc.smile.service.impl.CohortCompleteServiceImpl.saveCohort(CohortCompleteServiceImpl.java:56) ~[service-0.1.0.jar!/:0.1.0]
at org.mskcc.smile.service.impl.TempoMessageHandlingServiceImpl$CohortCompleteHandler.run(TempoMessageHandlingServiceImpl.java:274) ~[service-0.1.0.jar!/:0.1.0]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[na:na]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[na:na]
at java.base/java.lang.Thread.run(Thread.java:829) ~[na:na]
example cypher query of such a sample that's causing the error above:
MATCH (t:Tempo)<-[:HAS_TEMPO]-(s: Sample)-[:HAS_METADATA]->(sm: SampleMetadata) WHERE sm.primaryId = "12497_D_3" with s as s1, sm as sMetadata, count(t) as tCount WHERE tCount > 1 RETURN s1.smileSampleId,sMetadata.primaryId, tCount
Seems that duplicate Tempo nodes on a single sample are the root cause.
So that diagnoses the issue we're seeing but we'll (1) need to identify how many of these cases currently exist in the database, as they will continue throwing that IncorrectResultSizeDataAccessException and (2) run a script or a query that will identify and correct these cases by merging multiple Tempo nodes into a single node per sample.
And of course (3) dig into how/why this occurred, how likely it is to happen again, etc
INITIAL FINDINGS
Samples affected in production: 1,207 * by comparison there are zero records affected in the dev server
I'm going to walk Fox.. I'm going to add a note here though (will make it into a card when I get back).
Basically what I've found is that, at least in some cases, not all samples are getting imported/linked with a cohort due to a db exception occurring.
stack trace in smile-server
example cypher query of such a sample that's causing the error above:
Seems that duplicate Tempo nodes on a single sample are the root cause.
So that diagnoses the issue we're seeing but we'll (1) need to identify how many of these cases currently exist in the database, as they will continue throwing that IncorrectResultSizeDataAccessException and (2) run a script or a query that will identify and correct these cases by merging multiple Tempo nodes into a single node per sample. And of course (3) dig into how/why this occurred, how likely it is to happen again, etc
INITIAL FINDINGS
Samples affected in production: 1,207 * by comparison there are zero records affected in the dev server