[BUG] Failed to publish file across filesystems

cflerin commented 3 years ago

Describe the bug All publish steps fail to complete when the Nextflow work directory and the current working directory are on different filesystems.

To Reproduce Steps to reproduce the behavior:

Configure with these options:

Use the NXF_WORK environmental variable to direct all working files to a scratch drive that's on a different filesystem than the current working directory. Test with any of the test profiles:

nextflow pull vib-singlecell-nf/vsn-pipelines -r v0.23.0

export NXF_WORK=/ddn1/vol1/site_scratch/leuven/325/vsc32528/vsn_tmp

Run using this entry point:

nextflow run vib-singlecell-nf/vsn-pipelines -profile scenic,test__scenic,singularity -entry scenic -r v0.23.0

See error: These are warnings and the pipeline reports that it completes successfully, but there is no output data in the out directory:

WARN: Failed to publish file: /ddn1/vol1/site_scratch/leuven/325/vsc32528/vsn_tmp/77/c96d18799ed09bb8629fa2b43066fc/expr_mat_tiny.loom; to: /ddn1/vol1/staging/leuven/stg_00002/lcb/cflerin/analysis/pbmc_atac/analysis/nextflow/cross_filesystem_publish/out/scenic/scenic_CI/cistarget/expr_mat_tiny.loom [link] -- See log file for details
WARN: Failed to publish file: /ddn1/vol1/site_scratch/leuven/325/vsc32528/vsn_tmp/77/c96d18799ed09bb8629fa2b43066fc/scenic_CI__reg_mtf.csv.gz; to: /ddn1/vol1/staging/leuven/stg_00002/lcb/cflerin/analysis/pbmc_atac/analysis/nextflow/cross_filesystem_publish/out/scenic/scenic_CI/cistarget/scenic_CI__reg_mtf.csv.gz [link] -- See log file for details
WARN: Failed to publish file: /ddn1/vol1/site_scratch/leuven/325/vsc32528/vsn_tmp/e4/e0b523315c1aa3a54f20007a054098/expr_mat_tiny.loom; to: /ddn1/vol1/staging/leuven/stg_00002/lcb/cflerin/analysis/pbmc_atac/analysis/nextflow/cross_filesystem_publish/out/scenic/scenic_CI/aucell/expr_mat_tiny.loom [link] -- See log file for details
WARN: Failed to publish file: /ddn1/vol1/site_scratch/leuven/325/vsc32528/vsn_tmp/e4/e0b523315c1aa3a54f20007a054098/scenic_CI__auc_mtf.loom; to: /ddn1/vol1/staging/leuven/stg_00002/lcb/cflerin/analysis/pbmc_atac/analysis/nextflow/cross_filesystem_publish/out/scenic/scenic_CI/aucell/scenic_CI__auc_mtf.loom [link] -- See log file for details
WARN: Failed to publish file: /ddn1/vol1/site_scratch/leuven/325/vsc32528/vsn_tmp/da/146559f2ea88adaccba7d20b0aa383/scenic_visualize.loom; to: /ddn1/vol1/staging/leuven/stg_00002/lcb/cflerin/analysis/pbmc_atac/analysis/nextflow/cross_filesystem_publish/out/scenic/scenic_CI/SCENIC_SCope_output.loom [link] -- See log file for details
WARN: Failed to publish file: /ddn1/vol1/site_scratch/leuven/325/vsc32528/vsn_tmp/62/52a6bc5eddae9cd9ba5916966dc4ae/scenic_CI.SCENIC.loom; to: /ddn1/vol1/staging/leuven/stg_00002/lcb/cflerin/analysis/pbmc_atac/analysis/nextflow/cross_filesystem_publish/out/data/scenic_CI.SCENIC.loom [link] -- See log file for details

And an excerpt from the log reports Invalid cross-device link:

java.nio.file.FileSystemException: /ddn1/vol1/staging/leuven/stg_00002/lcb/cflerin/analysis/pbmc_atac/analysis/nextflow/cross_filesystem_publish/out/data/scenic_CI.SCENIC.loom -> /ddn1/vol1/site_scratch/leuven/325/vsc32528/vsn_tmp/62/52a6bc5eddae9cd9ba5916966dc4ae/scenic_CI.SCENIC.loom: Invalid cross-device link

Expected behavior Cross-filesystem publishing should work.

Screenshots NA

Please complete the following information:

OS: CentOS Linux release 7.8.2003 (Core)
Nextflow Version: 20.04.1
vsn-pipelines Version: v0.23.0

Additional context NA

KrisDavie commented 3 years ago

I think this needs to just be documented for users to understand that links don't work across file systems. Our other option is to convert to copying for publishing rather than sym/hard linking, that will always work, but of course will take more space when people don't clear their work folders.

cflerin commented 3 years ago

There must a better solution than to just say that this won't work at all. As a user, if I see that this NXF_WORK environmental variable is available in Nextflow, then it's reasonable to expect that it would work here.

Possible solutions:

Switching everything to copy would indeed be a quick fix for all of this, and working directories should be treated as temporary use in any case.
We can provide a parameter in the config that controls whether publish is done via hardlink or copy.

With option 2, the publish directives become:

process SC__PUBLISH {

    publishDir "${params.global.outdir}/data/intermediate", \
        mode: "${params.utils.publish.mode}", \
        saveAs: {
            filename -> "${outputFileName}"
        }

...

I've tested this briefly and it works for symlink, link, and copy methods in params.utils.publish.mode. I would also remove overwrite: true in this case to avoid re-copying large files, which can take a significant amount of time for many large files.

dweemx commented 3 years ago

I like option 2 better (as you described) and setting probably symlink as default Also good idea to remove overwrite: true

cflerin commented 3 years ago

I implemented option 2 above, but using link as the default (this is how it was in the existing code anyway).

Zifeng-L commented 3 years ago

I implemented option 2 above, but using link as the default (this is how it was in the existing code anyway).

Same issue here! I pulled v0.26.1 and still had the same problem. How can we get the notebooks now?

cflerin commented 3 years ago

Hi @Zifeng1995 , I think you can solve this by changing the publish mode to 'copy' in your config file and restarting the pipeline with resume enabled.

Zifeng-L commented 3 years ago

Hi @cflerin , I am a new hand for nextflow. I tried to copy the publish mode to my config file but it did not work.

manifest {
   name = 'vib-singlecell-nf/vsn-pipelines'
   description = 'A repository of pipelines for single-cell data in Nextflow DSL2'
   homePage = 'https://github.com/vib-singlecell-nf/vsn-pipelines'
   version = '0.26.1'
   mainScript = 'main.nf'
   defaultBranch = 'master'
   nextflowVersion = '!>=20.10.0'
}

params {
   global {
      project_name = '10x_PBMC'
      outdir = 'out'
   }
   misc {
      test {
         enabled = false
      }
   }
   utils {
      container = 'vibsinglecellnf/utils:0.4.0'
      publish {
         compressionLevel = 6
         annotateWithBatchVariableName = false
         mode = 'link'
      }
   }
   sc {
      file_converter {
         off = 'h5ad'
         tagCellWithSampleId = true
         remove10xGEMWell = false
         useFilteredMatrix = true
         makeVarIndexUnique = false
      }
      scanpy {
         container = 'vibsinglecellnf/scanpy:0.5.2'
         report {
            annotations_to_plot = []
         }
         feature_selection {
            report_ipynb = '/src/scanpy/bin/reports/sc_select_variable_genes_report.ipynb'
            method = 'mean_disp_plot'
            minMean = 0.0125
            maxMean = 3
            minDisp = 0.5
            off = 'h5ad'
         }
         feature_scaling {
            method = 'zscore_scale'
            maxSD = 10
            off = 'h5ad'
         }
         neighborhood_graph {
            nPcs = 50
            off = 'h5ad'
         }
         dim_reduction {
            report_ipynb = '/src/scanpy/bin/reports/sc_dim_reduction_report.ipynb'
            pca {
               method = 'pca'
               nComps = 50
               off = 'h5ad'
            }
            umap {
               method = 'umap'
               off = 'h5ad'
            }
            tsne {
               method = 'tsne'
               off = 'h5ad'
            }
         }
         clustering {
            preflight_checks = true
            report_ipynb = '/src/scanpy/bin/reports/sc_clustering_report.ipynb'
            method = 'louvain'
            resolution = 0.8
            off = 'h5ad'
         }
         marker_genes {
            method = 'wilcoxon'
            ngenes = 0
            groupby = 'louvain'
            off = 'h5ad'
         }
         filter {
            report_ipynb = '/src/scanpy/bin/reports/sc_filter_qc_report.ipynb'
            cellFilterStrategy = 'fixedthresholds'
            cellFilterMinNGenes = 200
            cellFilterMaxNGenes = 4000
            cellFilterMaxPercentMito = 0.15
            geneFilterMinNCells = 3
            off = 'h5ad'
            outdir = 'out'
         }
         data_transformation {
            method = 'log1p'
            off = 'h5ad'
         }
         normalization {
            method = 'cpx'
            countsPerCellAfter = 10000
            off = 'h5ad'
         }
      }
      scope {
         genome = ''
         tree {
            level_1 = ''
            level_2 = ''
            level_3 = ''
         }
      }
   }
   data {
      tenx {
         cellranger_mex = 'data/10x/1k_pbmc/1k_pbmc_*/outs/'
      }
   }
}

process SC__PUBLISH {
    publishDir "${params.global.outdir}/data/intermediate", 
    mode: "${params.utils.publish.mode}", \
        saveAs: {
            filename -> "${outputFileName}"
        }

process {
   executor = 'local'
   cpus = 2
   memory = '60 GB'
   clusterOptions = '-A cluster_account'
   withLabel:compute_resources__default {
      time = '1h'
   }
   withLabel:compute_resources__minimal {
      cpus = 1
      memory = '1 GB'
   }
   withLabel:compute_resources__mem {
      cpus = 4
      memory = '160 GB'
   }
   withLabel:compute_resources__cpu {
      cpus = 20
      memory = '80 GB'
   }
   withLabel:compute_resources__report {
      maxForks = 2
      cpus = 1
      memory = '160 GB'
   }
   withLabel:compute_resources__24hqueue {
      time = '24h'
   }
}

timeline {
   enabled = true
   file = 'out/nextflow_reports/execution_timeline.html'
}

report {
   enabled = true
   file = 'out/nextflow_reports/execution_report.html'
}

trace {
   enabled = true
   file = 'out/nextflow_reports/execution_trace.txt'
}

dag {
   enabled = true
   file = 'out/nextflow_reports/pipeline_dag.svg'
}

min {
   enabled = false
}

vsc {
   enabled = false
}

docker {
   enabled = true
   runOptions = '-i -v /cluster/home/zfli:/cluster/home/zfli'
}

cflerin commented 3 years ago

ok, take that publish step out (process SC__PUBLISH), and go back to your original config. This is the section you need to change:

   utils {
      container = 'vibsinglecellnf/utils:0.4.0'
      publish {
         compressionLevel = 6
         annotateWithBatchVariableName = false
         mode = 'link'
      }

make sure to set mode = 'copy' instead of link and this should fix your hardlink issue with the notebooks. Then re-run the pipeline with resume: nextflow run [...] -resume.

Zifeng-L commented 3 years ago

It still did not work after settingmode = 'copy' This is part of my config file

params {
   global {
      project_name = '10x_PBMC'
      outdir = 'out'
   }
   misc {
      test {
         enabled = false
      }
   }
   utils {
      container = 'vibsinglecellnf/utils:0.4.0'
      publish {
         compressionLevel = 6
         annotateWithBatchVariableName = false
         mode = 'copy'
      }
   }

These are warnings

WARN: Failed to publish file: /cluster/home/zfli/test/single_sample_test/work/5e/e00c91db073fe4576372b28d935b48/1k_pbmc_v3_chemistry.SC__H5AD_TO_LOOM.loom; to: /cluster/home/zfli/test/single_sample_test/out/loom/1k_pbmc_v3_chemistry.SCope_output.loom [link] -- See log file for details
WARN: Failed to publish file: /cluster/home/zfli/test/single_sample_test/work/a6/a699ab4cc3a5b58bd7beb10bd99a9a/1k_pbmc_v2_chemistry.SC__H5AD_TO_LOOM.loom; to: /cluster/home/zfli/test/single_sample_test/out/loom/1k_pbmc_v2_chemistry.SCope_output.loom [link] -- See log file for details

cflerin commented 3 years ago

It seems there are a few places where the publish mode is hardcoded in the loomHandler.nf processes.

But to get these files immediately you can just copy them using the full source and destination paths from the warning, for example:

cp \
  /cluster/home/zfli/test/single_sample_test/work/5e/e00c91db073fe4576372b28d935b48/1k_pbmc_v3_chemistry.SC__H5AD_TO_LOOM.loom \
  /cluster/home/zfli/test/single_sample_test/out/loom/1k_pbmc_v3_chemistry.SCope_output.loom

Zifeng-L commented 3 years ago

It seems there are a few places where the publish mode is hardcoded in the loomHandler.nf processes.

But to get these files immediately you can just copy them using the full source and destination paths from the warning, for example:
cp \
  /cluster/home/zfli/test/single_sample_test/work/5e/e00c91db073fe4576372b28d935b48/1k_pbmc_v3_chemistry.SC__H5AD_TO_LOOM.loom \
  /cluster/home/zfli/test/single_sample_test/out/loom/1k_pbmc_v3_chemistry.SCope_output.loom

Thanks for your help! I got it!

vib-singlecell-nf / vsn-pipelines

[BUG] Failed to publish file across filesystems #265