nextflow-io / nf-schema

Functionality for working with pipeline and sample sheet schema files in Nextflow pipelines
https://nextflow-io.github.io/nf-schema/
Apache License 2.0
12 stars 20 forks source link

Class used in plugin can't be used in a `resume` run #23

Open nvnieuwk opened 1 year ago

nvnieuwk commented 1 year ago

Bug report

Expected behavior and actual behavior

In the nf-validation plugin we use an extended class of LinkedHashMap called ImmutableMap for the meta fields. This class works fine when running in a normal run, but gives these warnings during a resumed run and reruns everything:

WARN: [PLUS (1)] Unable to resume cached task -- See log file for details

The log shows the following:

com.esotericsoftware.kryo.KryoException: Unable to find class: nextflow.validation.ImmutableMap
    at com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
    at com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
    at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:641)
    at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:752)
    at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:143)
    at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:21)
    at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:761)
    at com.esotericsoftware.kryo.Kryo$readClassAndObject$2.call(Unknown Source)
    at nextflow.util.KryoHelper.deserialize(SerializationHelper.groovy:181)
    at nextflow.util.KryoHelper.deserialize(SerializationHelper.groovy)
    at nextflow.util.KryoHelper$deserialize.call(Unknown Source)
    at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
    at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125)
    at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:139)
    at nextflow.processor.TaskContext.deserialize(TaskContext.groovy:202)
    at nextflow.cache.CacheDB.getTaskEntry(CacheDB.groovy:88)
    at nextflow.processor.TaskProcessor.checkCachedOrLaunchTask(TaskProcessor.groovy:770)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:568)
    at org.codehaus.groovy.runtime.callsite.PlainObjectMetaMethodSite.doInvoke(PlainObjectMetaMethodSite.java:48)
    at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoCachedMethodSiteNoUnwrapNoCoerce.invoke(PogoMetaMethodSite.java:189)
    at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.callCurrent(PogoMetaMethodSite.java:57)
    at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallCurrent(CallSiteArray.java:51)
    at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:171)
    at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:203)
    at nextflow.processor.TaskProcessor.invokeTask(TaskProcessor.groovy:618)
    at nextflow.processor.InvokeTaskAdapter.call(InvokeTaskAdapter.groovy:52)
    at groovyx.gpars.dataflow.operator.DataflowOperatorActor.startTask(DataflowOperatorActor.java:120)
    at groovyx.gpars.dataflow.operator.ForkingDataflowOperatorActor.access$001(ForkingDataflowOperatorActor.java:35)
    at groovyx.gpars.dataflow.operator.ForkingDataflowOperatorActor$1.run(ForkingDataflowOperatorActor.java:58)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.lang.ClassNotFoundException: nextflow.validation.ImmutableMap
    at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
    at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
    at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:520)
    at java.base/java.lang.Class.forName0(Native Method)
    at java.base/java.lang.Class.forName(Class.java:467)
    at com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:136)

Steps to reproduce the problem

You can clone this repository and run it with nextflow run main.nf. When rerunning the mini pipeline with the -resume flag, you'll see the errors/warnings.

Environment

pditommaso commented 1 year ago

The ImmutableMap should implement CacheFunnel interface

nvnieuwk commented 1 year ago

Thanks for the suggestion @pditommaso! I've sadly been unable to fix it with CacheFunnel. I implemented it like this:

// A class that works like Map, but returns an immutable copy with each method
public class ImmutableMap extends LinkedHashMap implements CacheFunnel {

    Map internalMap

    ImmutableMap(Map initialMap) {
        internalMap = initialMap
    }

    // Override the methods of the Map interface

    @Override
    Hasher funnel(Hasher hasher, HashMode mode) {
        hasher.putUnencodedChars(internalMap)
        return hasher
    }

    // Rest of the class

This still gives this error in the log: Caused by: java.lang.ClassNotFoundException: nextflow.validation.ImmutableMap

Did I implement it wrong?

pditommaso commented 1 year ago

Likely the best thing to do is to use Collections.unmodifiableMap instead of implement your own Immutable class

https://chat.openai.com/share/b0e9f648-21a6-4069-b474-a4a60e8f334d

nvnieuwk commented 1 year ago

We've investigated that and the main problem was that when copying an unmodifiableMap during a .map for example, the return type is again modifiable. This custom class made sure that it always returns immutable maps.

nvnieuwk commented 1 year ago

You can see that discussion here: https://github.com/nextflow-io/nf-validation/pull/32

pditommaso commented 1 year ago

I think it's not a plugin role to change the semantic of nextflow operator. Therefore you should use usual Map objects instead of ImmutableMap

nvnieuwk commented 1 year ago

Okay thank you for all the help, too bad there is no way to enforce the immutability of the meta map, but I understand your point on this

ewels commented 1 year ago

@pditommaso is there any other way to achieve what we want here?

The immutable maps feature came after @robsyme gave a talk about the dangers of meta map mutability: https://nf-co.re/events/2023/bytesize_workflow_safety

We can drop back to a regular map again, but it would be nice to protect users (and devs) from this problem if possible, somehow.

pditommaso commented 1 year ago

We'll take this into account for DSL3

robsyme commented 1 year ago

Perhaps we can use the @ValueObject decoration to automatically implement the required interface. I'll try and test this week.

mirpedrol commented 1 year ago

Hello! I am starting to add fromSamplesheet() to some nf-core pipelines, and it would be good to solve this before I merge any PR. Were you able to test if the decorator works @robsyme?

pditommaso commented 1 year ago

👉 https://github.com/nextflow-io/nf-schema/issues/23

robsyme commented 1 year ago

Paolo - the goal here is not to change in any way how Nextflow's operators work. The only goal here is to construct an object with the following properties: 1) Presents a Map-like (aka 🐍 dict) interface for holding metadata 2) Users can use the plus() operator to append new maps a) When they do this, a new object is returned rather than modifying the original object 3) Can be serialised by Kryo

robsyme commented 1 year ago

I've tried a couple of approaches today and hit some interesting and instructive road blocks 😆

Let's say we want these properties:

Example 1

We can put together a very simple example class to illustrate the challenges. We might do something like:

import nextflow.util.KryoHelper

class Meta {
    @Delegate Map internal = new LinkedHashMap()

    static { KryoHelper.register(Meta) }

    Object put(Object key, Object value) { internal.put(key, value) }

    Meta plus(Map right) { new Meta(internal: internal + right) }

    // ... and any other methods (minus, etc)
}

Example 2

We can add immutability in a number of different ways, but let's say we go for Nextflow's built-in ValueObject annotation:

import nextflow.util.KryoHelper
import nextflow.io.ValueObject

@ValueObject
class Meta {
    @Delegate Map internal = new LinkedHashMap()

    static { KryoHelper.register(Meta) }

    Object put(Object key, Object value) { internal.put(key, value) }

    Meta plus(Map right) { new Meta(internal: internal + right) }
}

Making the object immutable breaks Kryo serialization:

The Nextflow logs report:

Jul.-04 10:48:46.677 [Actor Thread 7] WARN  nextflow.processor.TaskProcessor - [TestCache (1)] Unable to resume cached task -- See log file for details
java.lang.UnsupportedOperationException: null
    at java.base/java.util.Collections$UnmodifiableMap.put(Collections.java:1505)
    at java_util_Map$put$1.call(Unknown Source)
    at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
    at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125)
    at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:148)
    at Meta.put(Meta.groovy:10)
    at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:144)
    at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:21)
    at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:761)
    at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:143)
    at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:21)

Why does is this no longer serializable?

Kryo MapSerializer

Our Meta class is a Map of sorts, and so the object is deserialized from a byte string to a new instance using the Kryo built-in MapSerializer. In this Serializer class, it first creates a new, empty object and then adds each key+value pair to the empty Map as they are read out of the serialized form.

This work of incrementally adding new key+value pairs is not supported by our newly immutable class, which is why the Kryo serialization breaks.

It's certainly possible to create a custom Serializer class, but I'm starting to think that this immutability feature is starting to feel a little over-engineered. There's a possibility that we're introducing a little too much "magic" - maybe using a vanailla LinkedHashMap class and then providing guidance in documentation is a better approach.

pditommaso commented 1 year ago

The only goal here is to construct an object with the following properties:

Why this plugin should use a "magic" object and not just a plain Map?

ewels commented 1 year ago

Because @robsyme gave a nf-core/bytesize talk that put the fear of god into us all about mutable map objects 😆

https://nf-co.re/events/2023/bytesize_workflow_safety

https://www.youtube.com/watch?v=A357C-ux6Dw

robsyme commented 1 year ago

Haha. It was certainly not my intention to scare anybody! 😆

Paolo: We had people modifying the map in flight, which can lead to results that depend on task execution timing. For example:

workflow {
    nums = Channel.of(1..10) | map { [val:it] }

    nums
    | Foo
    | map { meta -> meta.val += 1 }

    nums 
    | VariableProcess
    | DoSomethingElse
}

In this example, modification of the meta object in the closure modifies the same object being passed to VariableProcess and DoSomethingElse. If VariableProcess happens to finish quickly, DoSomethingElse might be launched before the val property is incremented. If VariableProcess takes longer than Foo, then the increment will happen beforehand. This can lead to unpredictable results and unusual behaviour where process caches might change, etc.

robsyme commented 1 year ago

I think that implementing the CacheFunnel interface is the easiest path forward, but because the object is also a Map, it's cachefunnel implementation will never be used. I've just opened up a Nextflow PR that would remedy this: https://github.com/nextflow-io/nextflow/pull/4077