nextflow-io / nextflow

A DSL for data-driven computational pipelines
http://nextflow.io
Apache License 2.0
2.75k stars 628 forks source link

Unable to resume cached task: com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 25 #4107

Closed tfenne closed 1 year ago

tfenne commented 1 year ago

Bug report

Expected behavior and actual behavior

Recently a workflow I've been working with for months has started having (apparently) spurious problems with using cached tasks. E.g. I'll have a run of the workflow that got almost to the end, then hit a bug in a new task. I'll make a tiny change to my main.nf and re-run, and then a whole bunch of tasks fail to resume from cache.

To me it seems random. E.g. in a workflow with 4 samples, 3/4 "TRIM_ADAPTERS" tasks will fail to resume, but the 1/4 will resume just fine. Same happens with other processes.

My log is full of stack traces like this:

Jul-17 18:16:55.936 [Actor Thread 29] WARN  nextflow.processor.TaskProcessor - [TRIM_ADAPTERS (1)] Unable to resume cached task -- See log file for details
com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 25
        at com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:119)
        at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:641)
        at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:752)
        at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:143)
        at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:21)
        at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:761)
        at com.esotericsoftware.kryo.Kryo$readClassAndObject$6.call(Unknown Source)
        at nextflow.util.KryoHelper.deserialize(SerializationHelper.groovy:181)
        at nextflow.util.KryoHelper.deserialize(SerializationHelper.groovy)
        at nextflow.util.KryoHelper$deserialize$0.call(Unknown Source)
        at nextflow.processor.TaskContext.deserialize(TaskContext.groovy:202)
        at nextflow.cache.CacheDB.getTaskEntry(CacheDB.groovy:88)
        at nextflow.processor.TaskProcessor.checkCachedOrLaunchTask(TaskProcessor.groovy:770)
        at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
        at java.base/java.lang.reflect.Method.invoke(Method.java:578)
        at org.codehaus.groovy.runtime.callsite.PlainObjectMetaMethodSite.doInvoke(PlainObjectMetaMethodSite.java:48)
        at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoCachedMethodSiteNoUnwrapNoCoerce.invoke(PogoMetaMethodSite.java:189)
        at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.callCurrent(PogoMetaMethodSite.java:57)
        at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:203)
        at nextflow.processor.TaskProcessor.invokeTask(TaskProcessor.groovy:618)
        at nextflow.processor.InvokeTaskAdapter.call(InvokeTaskAdapter.groovy:52)
        at groovyx.gpars.dataflow.operator.DataflowOperatorActor.startTask(DataflowOperatorActor.java:120)
        at groovyx.gpars.dataflow.operator.ForkingDataflowOperatorActor.access$001(ForkingDataflowOperatorActor.java:35)
        at groovyx.gpars.dataflow.operator.ForkingDataflowOperatorActor$1.run(ForkingDataflowOperatorActor.java:58)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
        at java.base/java.lang.Thread.run(Thread.java:1623)

I have no idea what the 25 refers to in the Encountered unregistered class ID: 25.

Steps to reproduce the problem

Program output

Environment

Additional context

pditommaso commented 1 year ago

That's unusual. Can you post/share the code of the TRIM_ADAPTERS task?

pditommaso commented 1 year ago

Unable to replicate. Closing this.

nh13 commented 3 months ago

We've seen this again in a different client's (private) workflow.

A complete shot in the dark, but werhaps we need the following somewhere:

kryo.setReferences(true)
kryo.setRegistrationRequired(false)

This will be really hard to reproduce and provide a test case, but we'll continue looking for one.