Closed cflerin closed 3 years ago
I think this needs to just be documented for users to understand that links don't work across file systems. Our other option is to convert to copying for publishing rather than sym/hard linking, that will always work, but of course will take more space when people don't clear their work folders.
There must a better solution than to just say that this won't work at all. As a user, if I see that this NXF_WORK
environmental variable is available in Nextflow, then it's reasonable to expect that it would work here.
Possible solutions:
With option 2, the publish directives become:
process SC__PUBLISH {
publishDir "${params.global.outdir}/data/intermediate", \
mode: "${params.utils.publish.mode}", \
saveAs: {
filename -> "${outputFileName}"
}
...
I've tested this briefly and it works for symlink
, link
, and copy
methods in params.utils.publish.mode
. I would also remove overwrite: true
in this case to avoid re-copying large files, which can take a significant amount of time for many large files.
I like option 2 better (as you described) and setting probably symlink
as default
Also good idea to remove overwrite: true
I implemented option 2 above, but using link
as the default (this is how it was in the existing code anyway).
I implemented option 2 above, but using
link
as the default (this is how it was in the existing code anyway).
Same issue here! I pulled v0.26.1 and still had the same problem. How can we get the notebooks now?
Hi @Zifeng1995 , I think you can solve this by changing the publish mode to 'copy' in your config file and restarting the pipeline with resume enabled.
Hi @cflerin , I am a new hand for nextflow. I tried to copy the publish mode to my config file but it did not work.
manifest {
name = 'vib-singlecell-nf/vsn-pipelines'
description = 'A repository of pipelines for single-cell data in Nextflow DSL2'
homePage = 'https://github.com/vib-singlecell-nf/vsn-pipelines'
version = '0.26.1'
mainScript = 'main.nf'
defaultBranch = 'master'
nextflowVersion = '!>=20.10.0'
}
params {
global {
project_name = '10x_PBMC'
outdir = 'out'
}
misc {
test {
enabled = false
}
}
utils {
container = 'vibsinglecellnf/utils:0.4.0'
publish {
compressionLevel = 6
annotateWithBatchVariableName = false
mode = 'link'
}
}
sc {
file_converter {
off = 'h5ad'
tagCellWithSampleId = true
remove10xGEMWell = false
useFilteredMatrix = true
makeVarIndexUnique = false
}
scanpy {
container = 'vibsinglecellnf/scanpy:0.5.2'
report {
annotations_to_plot = []
}
feature_selection {
report_ipynb = '/src/scanpy/bin/reports/sc_select_variable_genes_report.ipynb'
method = 'mean_disp_plot'
minMean = 0.0125
maxMean = 3
minDisp = 0.5
off = 'h5ad'
}
feature_scaling {
method = 'zscore_scale'
maxSD = 10
off = 'h5ad'
}
neighborhood_graph {
nPcs = 50
off = 'h5ad'
}
dim_reduction {
report_ipynb = '/src/scanpy/bin/reports/sc_dim_reduction_report.ipynb'
pca {
method = 'pca'
nComps = 50
off = 'h5ad'
}
umap {
method = 'umap'
off = 'h5ad'
}
tsne {
method = 'tsne'
off = 'h5ad'
}
}
clustering {
preflight_checks = true
report_ipynb = '/src/scanpy/bin/reports/sc_clustering_report.ipynb'
method = 'louvain'
resolution = 0.8
off = 'h5ad'
}
marker_genes {
method = 'wilcoxon'
ngenes = 0
groupby = 'louvain'
off = 'h5ad'
}
filter {
report_ipynb = '/src/scanpy/bin/reports/sc_filter_qc_report.ipynb'
cellFilterStrategy = 'fixedthresholds'
cellFilterMinNGenes = 200
cellFilterMaxNGenes = 4000
cellFilterMaxPercentMito = 0.15
geneFilterMinNCells = 3
off = 'h5ad'
outdir = 'out'
}
data_transformation {
method = 'log1p'
off = 'h5ad'
}
normalization {
method = 'cpx'
countsPerCellAfter = 10000
off = 'h5ad'
}
}
scope {
genome = ''
tree {
level_1 = ''
level_2 = ''
level_3 = ''
}
}
}
data {
tenx {
cellranger_mex = 'data/10x/1k_pbmc/1k_pbmc_*/outs/'
}
}
}
process SC__PUBLISH {
publishDir "${params.global.outdir}/data/intermediate",
mode: "${params.utils.publish.mode}", \
saveAs: {
filename -> "${outputFileName}"
}
process {
executor = 'local'
cpus = 2
memory = '60 GB'
clusterOptions = '-A cluster_account'
withLabel:compute_resources__default {
time = '1h'
}
withLabel:compute_resources__minimal {
cpus = 1
memory = '1 GB'
}
withLabel:compute_resources__mem {
cpus = 4
memory = '160 GB'
}
withLabel:compute_resources__cpu {
cpus = 20
memory = '80 GB'
}
withLabel:compute_resources__report {
maxForks = 2
cpus = 1
memory = '160 GB'
}
withLabel:compute_resources__24hqueue {
time = '24h'
}
}
timeline {
enabled = true
file = 'out/nextflow_reports/execution_timeline.html'
}
report {
enabled = true
file = 'out/nextflow_reports/execution_report.html'
}
trace {
enabled = true
file = 'out/nextflow_reports/execution_trace.txt'
}
dag {
enabled = true
file = 'out/nextflow_reports/pipeline_dag.svg'
}
min {
enabled = false
}
vsc {
enabled = false
}
docker {
enabled = true
runOptions = '-i -v /cluster/home/zfli:/cluster/home/zfli'
}
ok, take that publish step out (process SC__PUBLISH
), and go back to your original config. This is the section you need to change:
utils {
container = 'vibsinglecellnf/utils:0.4.0'
publish {
compressionLevel = 6
annotateWithBatchVariableName = false
mode = 'link'
}
make sure to set mode = 'copy'
instead of link and this should fix your hardlink issue with the notebooks. Then re-run the pipeline with resume: nextflow run [...] -resume
.
It still did not work after settingmode = 'copy'
This is part of my config file
params {
global {
project_name = '10x_PBMC'
outdir = 'out'
}
misc {
test {
enabled = false
}
}
utils {
container = 'vibsinglecellnf/utils:0.4.0'
publish {
compressionLevel = 6
annotateWithBatchVariableName = false
mode = 'copy'
}
}
These are warnings
WARN: Failed to publish file: /cluster/home/zfli/test/single_sample_test/work/5e/e00c91db073fe4576372b28d935b48/1k_pbmc_v3_chemistry.SC__H5AD_TO_LOOM.loom; to: /cluster/home/zfli/test/single_sample_test/out/loom/1k_pbmc_v3_chemistry.SCope_output.loom [link] -- See log file for details
WARN: Failed to publish file: /cluster/home/zfli/test/single_sample_test/work/a6/a699ab4cc3a5b58bd7beb10bd99a9a/1k_pbmc_v2_chemistry.SC__H5AD_TO_LOOM.loom; to: /cluster/home/zfli/test/single_sample_test/out/loom/1k_pbmc_v2_chemistry.SCope_output.loom [link] -- See log file for details
It seems there are a few places where the publish mode is hardcoded in the loomHandler.nf processes.
But to get these files immediately you can just copy them using the full source and destination paths from the warning, for example:
cp \
/cluster/home/zfli/test/single_sample_test/work/5e/e00c91db073fe4576372b28d935b48/1k_pbmc_v3_chemistry.SC__H5AD_TO_LOOM.loom \
/cluster/home/zfli/test/single_sample_test/out/loom/1k_pbmc_v3_chemistry.SCope_output.loom
It seems there are a few places where the publish mode is hardcoded in the loomHandler.nf processes.
But to get these files immediately you can just copy them using the full source and destination paths from the warning, for example:
cp \ /cluster/home/zfli/test/single_sample_test/work/5e/e00c91db073fe4576372b28d935b48/1k_pbmc_v3_chemistry.SC__H5AD_TO_LOOM.loom \ /cluster/home/zfli/test/single_sample_test/out/loom/1k_pbmc_v3_chemistry.SCope_output.loom
Thanks for your help! I got it!
Describe the bug All publish steps fail to complete when the Nextflow work directory and the current working directory are on different filesystems.
To Reproduce Steps to reproduce the behavior:
Use the
NXF_WORK
environmental variable to direct all working files to a scratch drive that's on a different filesystem than the current working directory. Test with any of the test profiles:Run using this entry point:
See error: These are warnings and the pipeline reports that it completes successfully, but there is no output data in the
out
directory:And an excerpt from the log reports
Invalid cross-device link
:Expected behavior Cross-filesystem publishing should work.
Screenshots NA
Please complete the following information:
Additional context NA