Closed Jay-uu closed 11 months ago
I think this is because the task work dir depends on the task hash, which is not computed until after the task script is evaluated.
But you don't need to do this because mOTU_dir
is already a path inside the task directory, so just resolve against that variable:
bin_list = files(mOTU_dir.resolve("*.fa"))
Thanks so much! I saw some mentions about using resolve in other places but couldn't figure out the syntax.
Edit:
I tried it, but bin_list shows as empty. From the log file I can see that it doesn't seem to look for the input in the right place:
nextflow.Nextflow - No such file or directory: c__Cyanobacteriia_mOTU_0/ -- Skipping visit
log.nextflow.log
If I move the input direcotry so that it's present in the launchDir it runs, but since I want to have a process in the middle of a pipeline this is of course not a valid solution.
Hi again! To make it easier for you to test it I'm sending the updated code. The input directory is located in a different directory than the pipeline script, and has a number of .fa files. Trying to resolve against the input variable no longer causes an error, but also does not correctly read the input file if it's not present in the launch directory. The screenshots show the output of 1. When the input directory is located in a different location than the launch directory and 2. When the input directory is located in the same location as the launch directory.
process mOTUs_to_pangenome {
debug true
input:
path(mOTU_dir)
shell:
println("Checking number of bins")
bin_list = files(mOTU_dir.resolve("*.fa")) //Runs but no result. Log file: No such file or directory: <input name>/ -- Skipping visit
//bin_list = files(task.workDir.resolve("*.fa")) //Causes error: Cannot invoke method resolve() on null object
println("Present bins:")
println(bin_list)
c = bin_list.size()
single_bin = bin_list[0]
println("single_bin variable is:")
println(single_bin)
if( c > 1 )
'''
#bash code
echo "hey how come your mom lets you have two files?"
'''
else
'''
#!/usr/bin/env python
#python code!
print("ah, just one file I see")
'''
}
workflow {
pg_dir = Channel.fromPath("/home/jay/c__Cyanobacteriia_mOTU_0", type: "dir", checkIfExists: true)
mOTUs_to_pangenome(pg_dir)
}
Program output when input is in a different directory than the pipeline script:
N E X T F L O W ~ version 23.04.3
Launching `issue.nf` [distraught_mandelbrot] DSL2 - revision: 177c3cbb16
executor > local (1)
[f5/40124d] process > mOTUs_to_pangenome (1) [100%] 1 of 1 ā
Checking number of bins
Present bins:
[]
single_bin variable is:
null
ah, just one file I see
Program output when the input is in the same directory as the pipeline script:
N E X T F L O W ~ version 23.04.3
Launching `issue.nf` [desperate_monod] DSL2 - revision: 177c3cbb16
executor > local (1)
[26/4b7ad4] process > mOTUs_to_pangenome (1) [100%] 1 of 1 ā
Checking number of bins
Present bins:
[c__Cyanobacteriia_mOTU_0/mock1.maxbin.006.fasta.contigs.fa, c__Cyanobacteriia_mOTU_0/test.fa]
single_bin variable is:
c__Cyanobacteriia_mOTU_0/mock1.maxbin.006.fasta.contigs.fa
hey how come your mom lets you have two files?
I think a better approach here would be something like this:
process mOTUs_to_pangenome {
debug true
input:
path(bin_list, arity: '1..*')
shell:
println("Present bins:")
println(bin_list)
c = bin_list.size()
single_bin = bin_list[0]
println("single_bin variable is:")
println(single_bin)
if( c > 1 )
'''
#bash code
echo "hey how come your mom lets you have two files?"
'''
else
'''
#!/usr/bin/env python
#python code!
print("ah, just one file I see")
'''
}
workflow {
pg_files = Channel.fromPath("/home/jay/c__Cyanobacteriia_mOTU_0/*.fa", checkIfExists: true)
mOTUs_to_pangenome(pg_files)
}
Basically, perform the glob outside of the process. I also used the new arity
option so that the files are always a list.
You can take this further by moving the rest of the shell
code into channel logic in the workflow, then you could e.g. have two different processes for the different cases, but I will leave that as an exercise for the reader š
Thanks for your suggestions! I might decide to modularize the pipeline in the future and have multiple workflows - which would make the workflow check + multiple processes solution a little unstable, but it's definitely something that works now. I'll also keep a look out to see if the arity function makes it into the stable release. Thank you for taking the time to help!
BTW the arity feature is available in 23.10.0
Bug report
Hi! Thanks for all your work making Nextflow! I'm developing a pipeline for genome analysis and have encountered a need for being able to do local checks within a process before an if-else script block. My issue is related to #2628 and #3962, but I hope it's different enough to warrant opening a new issue.
It seems to me that the task.workDir isn't initialised until after one of the script sections start. This causes a problem when I want to interact with the input before choosing what code to run.
Expected behavior and actual behavior
Expected behaviour: When I have a script block defined, any code within it is executed within or has access to the task.workDir. Actual behaviour: If a conditional script block is used, any groovy code outside the quotes is executed within the launchDir, and the workDir doesn't exist yet.
Steps to reproduce the problem
Program output
Environment
Additional context
(Add any other context about the problem here) taskDir_null.nextflow.log