Open nh13 opened 3 years ago
I wonder if it would be possible by using a different type as a member of the ValueObject
? I am not very familiar with the source code, so I assume this wouldn't actually work, but something like:
import nextflow.io.ValueObject
import nextflow.processor.TaskPath
@ValueObject
class SampleData {
/** The sample name */
String name
/** The path to the read one FASTQ */
TaskPath r1
/** The path to the read two FASTQ */
TaskPath r2
}
Or maybe nextflow.file.FileHolder
?
Would just String
be enough for paths? internal classes are not meant to be used in pipeline scripts.
I'd love to have things properly typed (eg. Path
).
In principle Path
should work as well. What's the problem with it?
Does nextflow inspect the object for members of type Path
and localize already?
What do you mean for localize?
If I pass in a SampleData
class in the script, the r1
process foo {
input:
val sample // Type: SampleData
path fasta
output:
???
script:
"""
bwa mem ${fasta} ${sample.r1} ${sample.r2}
"""
}
Files being localized from the working directory to the analysis node (mainly for cloud workflows)
we are entering in a minefield .. 😄 no, the file staging (ie localisation) is determined by the path
declaration. The pattern you are suggesting is interesting but not supported.
There's a variety of situations where this capability would make it less error prone to deal with collections of genomic files, including VCF (+ index), BAM (+ index), FASTA (+ index). In the above example, you could imagine additional behaviors such as "is_paired" to allow scripts to handle single- or paired-end data in a more expressive way. Is this something that would completely break the process handling design or is there an opportunity for a user contribution in this area?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Still relevant, here’s what I do:
import groovy.transform.Immutable
import nextflow.io.ValueObject
import nextflow.util.KryoHelper
@ValueObject
@Immutable(copyWith=true, knownImmutables = ['path'])
class Metadata {
static {
// Register this class with the Kryo framework that serializes and deserializes objects
// that pass through channles. This allows for caching when this object is used.
KryoHelper.register(Metadata)
}
Path path
}
@nh13 Do you have an example of a pipeline you've used that in?
@Emiller88 not in one I can share publicly, happy to help with a small hello world example.
@Emiller88 not in one I can share publicly, happy to help with a small hello world example.
I figured :disappointed:. Not sure if that was you offering to write it or answer questions.
I'm guessing you toss the KryoHelper
class in a groovy file in lib/
. I guess I'm stuck on the output: ???
piece as well?
I was offering, but it’s not high on my list.
@nh13 , I was hoping that a fully worked example wouldn't be necessarily be needed if it is just a question of how to structure the "output" line in the process definition such that the custom class will be appropriately instantiated by the framework, but perhaps that is wishful thinking.
Also, one additional request for clarification, are you saying in the approach that you use above that process results are appropriately cached and any suitably annotated paths in the custom class are also localized?
KryoHelper
is part of nextflow, so no need to include it other than the import at the top of the custom class groovy file (yes, put the custom groovy file in /lib
) input:
tuple val(meta), path(in_bam)
output:
tuple(val(meta), path("${meta.name()}.out.txt") , emit: txt)
@nh13 Okay I threw together an example
N E X T F L O W ~ version 22.04.0
Launching `main.nf` [elated_bell] DSL2 - revision: 4a32a3cac6
BUG! exception in phase 'semantic analysis' in source unit 'Script_097d04d3' The lookup for CustomObject caused a failed compilation. There should not have been any compilation from this call.
@Emiller88 , I suspect compilation of your custom class is failing on resolution of the "Path" type (needs to be imported?). Maybe can strip down example to use native types as a first pass.
@Emiller88 , I suspect compilation of your custom class is failing on resolution of the "Path" type (needs to be imported?). Maybe can strip down example to use native types as a first pass.
This was it. I've pushed up a working test, but the class just replaces the map, so it's not really doing much.
@nh13 , I'm assuming that in your previous example, the described workaround only allows for definition of the custom class, but not any associated localization of the files? If that is the case, I tried an alternate approach that would require some modification to the underlying Nextflow framework. Briefly, the custom class must be treated as a path input and provides a collection of associated paths that must be staged. While this addresses the file staging need, none of the custom attributes are accessible. For that, there is an additional interface generated, CustromFileAttributesProvider that the custom class can implement. If available, this interface method is used to define additional properties on the object exposed during script generation.
The changes to Nextflow are summarized in a PR here, https://github.com/ryankelley-lm/nextflow/pull/1, with an associated example leveraging the interface at https://github.com/ryankelley-lm/hello-custom-objects-1/tree/custom-object-download
but perhaps you were able to achieve this in a simpler way?
@pditommaso , the PR above is a rough first pass, but do you think the general concept is something that could be reworked into an acceptable PR?
CC @Emiller88
@ryankelley-lm if you use a path it will work just fine for local paths. I’m not sure how or if remote files will work.
I think to handle this properly it's required to introduce a new input/output qualifier record
as an alternative to tuple
.
A tentative syntax could be as shown below:
@ValueObject
class SampleData {
String name
Path r1
Path r2
}
process foo {
output:
record SampleData(name: 'some_name', r1: '*_1.bam', r2: '*_2.bam')
'''
some_command
'''
}
process bar {
input:
record data: SampleData()
"""
other_command --sample $data.name --reads $data.r1 $data.r2
"""
}
workflow {
foo | bar
}
The idea is to continue to have tuple-like semantics both for the input and output declaration to have the ability to use custom record types
@pditommaso , Thanks for the feedback. In the above, would "record" be an alternative to "tuple" or "val"? When you say that "record" would be an alternative to "tuple", do you imagine that you would be packaging multiple different custom object instances into a single "record" line, in the same way that multiple instances are packaged into a "tuple"? Or is a each "record" a single object instance, like with a "val"?
At some point, would you still require some contract to be implemented by the custom class, such that the framework knows what paths the custom object is dealing with?
@pditommaso , or is the idea that a "record" is essentially a named tuple, such that the framework knows each of the data members and for path, can handle them appropriately?
FWIW, I believe most of the desired behavior here is achievable with the current release of Nextflow if you
The idea is that Nextflow will see your custom object as a tuple and treat it accordingly. In your own scripts, you can use the specific behaviors of your custom object. Since Nextflow knows how to stage tuple appropriately, all of the staging will be handled correctly. This idea is captured in the updated example here, https://github.com/ryankelley-lm/hello-custom-objects-1/tree/custom-object-download
nextflow.enable.dsl = 2
process STEPONE_PROCESS {
input:
tuple val(myname), path(r1), path(r2)
output:
tuple val("${myname}"), path("output1.txt"), path("output2.txt"), emit: paths
script:
"""
cat ${r1} ${r2} >> output1.txt
cat ${r2} ${r1} >> output2.txt
"""
}
workflow STEPONE_WORKFLOW {
take: workflowInput
main:
processResult = STEPONE_PROCESS(workflowInput).map {
name, path1, path2 ->
return new CustomObject(name, path1, path2)
}
emit:
processResult
}
import groovy.transform.Immutable
import nextflow.io.ValueObject
import nextflow.util.KryoHelper
import java.nio.file.Path
import nextflow.util.ArrayBag
import java.util.List
import nextflow.file.FileHolder
import java.util.Map
class CustomObject extends ArrayBag {
static {
// Register this class with the Kryo framework that serializes and deserializes objects
// that pass through channles. This allows for caching when this object is used.
KryoHelper.register(CustomObject)
}
String name
Path r1
Path r2
CustomObject(String name, Path r1, Path r2)
{
super()
this.name = name
this.r1 = r1
this.r2 = r2
this.target.add(name)
this.target.add(r1)
this.target.add(r2)
}
public String toString()
{
return this.name + " " + this.r1.toString() + " " + this.r2.toString()
}
}
Now that we can pass around custom (value/metadata) classes and have them cached, it would be wonderful if Nextflow could recognize
Path
members/properties so they are localized and available for the process that uses the class. For example:There could be a way annotate
r1
andr2
(or return the list of paths etc) for nextflow to localize them.