Open nvnieuwk opened 1 year ago
The ImmutableMap
should implement CacheFunnel interface
Thanks for the suggestion @pditommaso! I've sadly been unable to fix it with CacheFunnel
. I implemented it like this:
// A class that works like Map, but returns an immutable copy with each method
public class ImmutableMap extends LinkedHashMap implements CacheFunnel {
Map internalMap
ImmutableMap(Map initialMap) {
internalMap = initialMap
}
// Override the methods of the Map interface
@Override
Hasher funnel(Hasher hasher, HashMode mode) {
hasher.putUnencodedChars(internalMap)
return hasher
}
// Rest of the class
This still gives this error in the log: Caused by: java.lang.ClassNotFoundException: nextflow.validation.ImmutableMap
Did I implement it wrong?
Likely the best thing to do is to use Collections.unmodifiableMap
instead of implement your own Immutable class
https://chat.openai.com/share/b0e9f648-21a6-4069-b474-a4a60e8f334d
We've investigated that and the main problem was that when copying an unmodifiableMap during a .map
for example, the return type is again modifiable. This custom class made sure that it always returns immutable maps.
You can see that discussion here: https://github.com/nextflow-io/nf-validation/pull/32
I think it's not a plugin role to change the semantic of nextflow operator. Therefore you should use usual Map objects instead of ImmutableMap
Okay thank you for all the help, too bad there is no way to enforce the immutability of the meta map, but I understand your point on this
@pditommaso is there any other way to achieve what we want here?
The immutable maps feature came after @robsyme gave a talk about the dangers of meta map mutability: https://nf-co.re/events/2023/bytesize_workflow_safety
We can drop back to a regular map again, but it would be nice to protect users (and devs) from this problem if possible, somehow.
We'll take this into account for DSL3
Perhaps we can use the @ValueObject decoration to automatically implement the required interface. I'll try and test this week.
Hello!
I am starting to add fromSamplesheet()
to some nf-core pipelines, and it would be good to solve this before I merge any PR. Were you able to test if the decorator works @robsyme?
Paolo - the goal here is not to change in any way how Nextflow's operators work. The only goal here is to construct an object with the following properties:
1) Presents a Map-like (aka 🐍 dict) interface for holding metadata
2) Users can use the plus()
operator to append new maps
a) When they do this, a new object is returned rather than modifying the original object
3) Can be serialised by Kryo
I've tried a couple of approaches today and hit some interesting and instructive road blocks 😆
Let's say we want these properties:
plus
methodWe can put together a very simple example class to illustrate the challenges. We might do something like:
import nextflow.util.KryoHelper
class Meta {
@Delegate Map internal = new LinkedHashMap()
static { KryoHelper.register(Meta) }
Object put(Object key, Object value) { internal.put(key, value) }
Meta plus(Map right) { new Meta(internal: internal + right) }
// ... and any other methods (minus, etc)
}
plus
methodWe can add immutability in a number of different ways, but let's say we go for Nextflow's built-in ValueObject
annotation:
import nextflow.util.KryoHelper
import nextflow.io.ValueObject
@ValueObject
class Meta {
@Delegate Map internal = new LinkedHashMap()
static { KryoHelper.register(Meta) }
Object put(Object key, Object value) { internal.put(key, value) }
Meta plus(Map right) { new Meta(internal: internal + right) }
}
Making the object immutable breaks Kryo serialization:
plus
methodThe Nextflow logs report:
Jul.-04 10:48:46.677 [Actor Thread 7] WARN nextflow.processor.TaskProcessor - [TestCache (1)] Unable to resume cached task -- See log file for details
java.lang.UnsupportedOperationException: null
at java.base/java.util.Collections$UnmodifiableMap.put(Collections.java:1505)
at java_util_Map$put$1.call(Unknown Source)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:148)
at Meta.put(Meta.groovy:10)
at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:144)
at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:21)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:761)
at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:143)
at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:21)
Why does is this no longer serializable?
Our Meta
class is a Map of sorts, and so the object is deserialized from a byte string to a new instance using the Kryo built-in MapSerializer. In this Serializer class, it first creates a new, empty object and then adds each key+value pair to the empty Map as they are read out of the serialized form.
This work of incrementally adding new key+value pairs is not supported by our newly immutable class, which is why the Kryo serialization breaks.
It's certainly possible to create a custom Serializer class, but I'm starting to think that this immutability feature is starting to feel a little over-engineered. There's a possibility that we're introducing a little too much "magic" - maybe using a vanailla LinkedHashMap class and then providing guidance in documentation is a better approach.
The only goal here is to construct an object with the following properties:
Why this plugin should use a "magic" object and not just a plain Map?
Because @robsyme gave a nf-core/bytesize talk that put the fear of god into us all about mutable map objects 😆
Haha. It was certainly not my intention to scare anybody! 😆
Paolo: We had people modifying the map in flight, which can lead to results that depend on task execution timing. For example:
workflow {
nums = Channel.of(1..10) | map { [val:it] }
nums
| Foo
| map { meta -> meta.val += 1 }
nums
| VariableProcess
| DoSomethingElse
}
In this example, modification of the meta
object in the closure modifies the same object being passed to VariableProcess
and DoSomethingElse
. If VariableProcess
happens to finish quickly, DoSomethingElse
might be launched before the val
property is incremented. If VariableProcess
takes longer than Foo
, then the increment will happen beforehand. This can lead to unpredictable results and unusual behaviour where process caches might change, etc.
I think that implementing the CacheFunnel interface is the easiest path forward, but because the object is also a Map, it's cachefunnel implementation will never be used. I've just opened up a Nextflow PR that would remedy this: https://github.com/nextflow-io/nextflow/pull/4077
Bug report
Expected behavior and actual behavior
In the
nf-validation
plugin we use an extended class ofLinkedHashMap
calledImmutableMap
for the meta fields. This class works fine when running in a normal run, but gives these warnings during a resumed run and reruns everything:The log shows the following:
Steps to reproduce the problem
You can clone this repository and run it with
nextflow run main.nf
. When rerunning the mini pipeline with the-resume
flag, you'll see the errors/warnings.Environment