nextflow-io / nextflow

A DSL for data-driven computational pipelines
http://nextflow.io
Apache License 2.0
2.61k stars 605 forks source link

Unable to resume cached task with enum in parameters to process #4816

Open aringeri opened 3 months ago

aringeri commented 3 months ago

Bug report

(Please follow this template replacing the text between parentheses with the requested information)

Expected behavior and actual behavior

When running the following pipeline we would expect the resume mechanism to work without recomputing the values from the process.

Steps to reproduce the problem

In ./lib/MyEnum.groovy file

public enum MyEnum {
    A,
    B
}

in main.nf file

workflow {
    ch = Channel.of([MyEnum.A, MyEnum.B])

    SomeTask(ch)

}

process SomeTask {
    input:
        tuple val(meta), val(myEnum)

    output:
        tuple val(meta), path('out.txt')

    script:
    """
    echo "hello $myEnum" > out.txt
    """
}

Running:

nextflow run main.nf
nextflow run main.nf -resume

Program output

Mar-14 15:01:45.856 [Actor Thread 3] WARN  nextflow.processor.TaskProcessor - [SomeTask (1)] Unable to resume cached task -- See log file for details
com.esotericsoftware.kryo.KryoException: Unable to find class: MyEnum
        at com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
        at com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
        at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:641)
        at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:752)
        at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:143)
        at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:21)
        at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:761)
        at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:143)
        at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:21)
        at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:761)
        at com.esotericsoftware.kryo.Kryo$readClassAndObject$5.call(Unknown Source)
        at nextflow.util.KryoHelper.deserialize(SerializationHelper.groovy:181)
        at nextflow.util.KryoHelper.deserialize(SerializationHelper.groovy)
        at nextflow.util.KryoHelper$deserialize$0.call(Unknown Source)
        at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
        at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125)
        at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:139)
        at nextflow.processor.TaskContext.deserialize(TaskContext.groovy:202)
        at nextflow.cache.CacheDB.getTaskEntry(CacheDB.groovy:88)
        at nextflow.processor.TaskProcessor.checkCachedOrLaunchTask(TaskProcessor.groovy:791)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:568)
        at org.codehaus.groovy.runtime.callsite.PlainObjectMetaMethodSite.doInvoke(PlainObjectMetaMethodSite.java:48)
        at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoCachedMethodSiteNoUnwrapNoCoerce.invoke(PogoMetaMethodSite.java:189)
        at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.callCurrent(PogoMetaMethodSite.java:57)
        at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallCurrent(CallSiteArray.java:51)
        at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:171)
        at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:203)
        at nextflow.processor.TaskProcessor.invokeTask(TaskProcessor.groovy:639)
        at nextflow.processor.InvokeTaskAdapter.call(InvokeTaskAdapter.groovy:52)
        at groovyx.gpars.dataflow.operator.DataflowOperatorActor.startTask(DataflowOperatorActor.java:120)
        at groovyx.gpars.dataflow.operator.ForkingDataflowOperatorActor.access$001(ForkingDataflowOperatorActor.java:35)
        at groovyx.gpars.dataflow.operator.ForkingDataflowOperatorActor$1.run(ForkingDataflowOperatorActor.java:58)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: java.lang.ClassNotFoundException: MyEnum
        at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
        at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
        at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
        at java.base/java.lang.Class.forName0(Native Method)
        at java.base/java.lang.Class.forName(Class.java:467)
        at com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:136)
        ... 37 common frames omitted

Environment

Additional context

bentsherman commented 3 months ago

I thought we fixed this with #3901

Does it work if you change the process input to just the enum value?

    input:
    val(myEnum)
aringeri commented 3 months ago

Hi @bentsherman , Thanks for your suggestion. It will succeed if I change the input and output to:

input:
        val(myEnum)

output:
        path('out.txt')

but my use case is really more to use enums to tell me something about the data I'm working with (then execute different commands conditionally based on that enum).

This minimal workflow is more characteristic of what I'm trying to do and still produces the error:

workflow {

    ch = Channel.of(
        tuple([type: MyEnum.A], "data-file-1"), 
        tuple([type: MyEnum.B], "data-file-2")
    )

    SomeTask(ch)

}

process SomeTask {
    input:
        tuple val(meta), val(data)

    output:
        tuple val(meta), path('out.txt')

    script:
    """
    echo "hello $data" > out.txt
    """
}