nextflow-io / nextflow

A DSL for data-driven computational pipelines
http://nextflow.io
Apache License 2.0
2.78k stars 634 forks source link

symlinks for staged files in directories are not removed in .command.run #4971

Open nick-youngblut opened 6 months ago

nick-youngblut commented 6 months ago

Bug report

For a failed job in my Nextflow pipeline, I'm manually running bash .command.run and I'm getting ln: failed to create symbolic link 'DIRECTORY_NAME/FILE_NAME.txt': File exists.

The nxf_stage() function includes:

nxf_stage() {
    true
    # stage input files
    mkdir -p 164164 && ln -s /home/nickyoungblut/tmp/work/ad/d0abcbad4c7b9137844e3ba48c8af4/KAPA_mRNA-enrichment_HumanRefRNA_500ng_1e-2dilution_20240417_C01_R1_001/summary.txt 164164/summary.txt
    mkdir -p 6262 && ln -s /home/nickyoungblut/tmp/work/53/f75d0ff2aa376187fafae661a5b400/DJv3_NT1_ctrl_rep1_031524_R1_001/fastqc_data.txt 6262/fastqc_data.txt
    mkdir -p 284284 && ln -s /home/nickyoungblut/tmp/work/be/b0b67f565d4ae5a0230e453acaa236/DJv2_FTH1_kd_rep2_031524_R2_001/fastqc_data.txt 284284/fastqc_data.txt 
    [...]
}

The symlinks are not removed via rm -f prior to recreating them in the nxf_stage() function, and ln -s is used instead of ln -sf. This results in the error when manually re-running .command.run. This make troubleshooting failed jobs harder, since I manually have to delete existing symlinks or comment-out all of the ln -s commands in nxf_stage().

This issue does not occur for files not in staged directories, just for mkdir -p new_directory && ln -s new_directory/new_file.txt.

Expected behavior and actual behavior

See above

Steps to reproduce the problem

This should occur for any pipeline that creates staged files in directories: mkdir -p new_directory && ln -s new_directory/new_file.txt

Program output

See above

Environment

Additional context

See this slack thread

pditommaso commented 6 months ago

This is likely because you have many staged files, see here

https://github.com/nextflow-io/nextflow/blob/aa9e127373de3bc0b4b78640279336cdd6d003aa/modules/nextflow/src/main/groovy/nextflow/executor/SimpleFileCopyStrategy.groovy#L122-L133

nick-youngblut commented 6 months ago

Thanks @pditommaso for pointing that out! What is the problem with including possibly a few 1000 more lines in the runner script?

pditommaso commented 6 months ago

It's explained in the comment. To contain the script file size. You can delete all symlink using a Bash oneliner like find . -type l -delete or something similar

nick-youngblut commented 6 months ago

Why does the file size need to be contained to <100 lines of removing symlinks? Extending to 1000's of lines will not add much size to the file.

You can delete all symlink using a Bash oneliner like find . -type l -delete or something similar

Why not just use find . -type l -delete instead of removing each symlink individually in the runner script?

pditommaso commented 6 months ago

lol. need to think it there could be other links. @bentsherman opinion?

nick-youngblut commented 6 months ago

need to think it there could be other links

I thought all symlinks were (re)created by the runner script, but maybe I'm mistaken?

bentsherman commented 6 months ago

Deleting all links should be fine, I can't think of any other links that are created. But Nick also suggested using ln -sf instead of deleting the links, maybe that would be better

jamesamcl commented 2 months ago

I just hit this problem too when using nextflow -resume:

Command exit status:
  1

Command output:
  (empty)

Command wrapper:
  ln: failed to create symbolic link 'prop_summary.json': File exists

not sure if I understand the comments above, this just looks like a bug?