nextflow-io / nextflow

A DSL for data-driven computational pipelines
http://nextflow.io
Apache License 2.0
2.68k stars 621 forks source link

Parameter merging is not working in all cases. #739

Closed mahesh-panchal closed 6 years ago

mahesh-panchal commented 6 years ago

Hi Paolo,

I have a case here where parameter merging is not working but I do not understand why, and cannot recreate the issue in other toy examples.

In this example params.busco_lineages from test.conf is not fully resolved, even though params.busco_lineages_home is set in cluster_rackham.conf.

$ nextflow config -profile cluster,rackham,test_data
params {
   reads = '$HOME/genome_assembly_pipeline/genome_assembly_pipeline/test_data/PhiX_reads_{1,2}.fastq.gz'
   draft_assemblies = '$HOME/genome_assembly_pipeline/genome_assembly_pipeline/test_data/PhiX_spades_assembly.fasta'
   genome_size = 5500
   insert_size = 500
   reference = '$HOME/genome_assembly_pipeline/genome_assembly_pipeline/test_data/Illumina_PhiX_reference.fasta.gz'
   busco_lineages_home = '/sw/apps/bioinfo/BUSCO/v3_lineage_sets'
   busco_lineages = '[:]/{bacteria,eukaryota}_odb9'
   project = 'snic2018-8-34'
   clusterOptions = '-t 1:00:00'
   output_dir = './NBIS-results'
   max_memory = '128 GB'
   max_cpus = 16
   max_time = '10d'
}

timeline {
   enabled = true
   file = './NBIS-results/pipeline_info/NBIS-workflow_timeline.html'
}

report {
   enabled = true
   file = './NBIS-results/pipeline_info/NBIS-workflow_report.html'
}

trace {
   enabled = true
   file = './NBIS-results/pipeline_info/NBIS-workflow_trace.txt'
}

dag {
   enabled = true
   file = './NBIS-results/pipeline_info/NBIS-workflow_DAG.svg'
}

env {
   NXF_LAUNCHER = '/scratch'
   NXF_TEMP = '/scratch'
}

process {
   scratch = '$SNIC_TMP'
   clusterOptions = '-A snic2018-8-34 -t 1:00:00'
   shell = ['/bin/bash', '-euo', 'pipefail']
   cpus = 8
}

The config files causing the issue are: nextflow.config:

profiles {

        // default profile
        standard {
                process.executor = 'local'
        }

        // Project data settings
        test_data {
                includeConfig 'test_data.conf'
        }

        // Cluster settings: Usage - nextflow run workflow.nf -profile cluster,rackham,test_data

        cluster {
                params {
                        // Workflow settings.
                        project = 'snic2018-8-34'
                        clusterOptions = '-t 1:00:00'
                        output_dir = './NBIS-results'
                }
                timeline {
                        enabled = true
                        file = "${params.output_dir}/pipeline_info/NBIS-workflow_timeline.html"
                }
                report {
                        enabled = true
                        file = "${params.output_dir}/pipeline_info/NBIS-workflow_report.html"
                }
                trace {
                        enabled = true
                        file = "${params.output_dir}/pipeline_info/NBIS-workflow_trace.txt"
                }
                dag {
                        enabled = true
                        file = "${params.output_dir}/pipeline_info/NBIS-workflow_DAG.svg"
                }
        }

        rackham {
                includeConfig 'cluster_rackham.conf'
        }
}

cluster_rackham.conf

params {
        max_memory = 128.GB
        max_cpus = 16
        max_time = 240.h

        busco_lineages_home = '/sw/apps/bioinfo/BUSCO/v3_lineage_sets'
}
env {
        NXF_LAUNCHER="$SNIC_TMP"
        NXF_TEMP="$SNIC_TMP"
}
process {

        // Global process config
        scratch = '$SNIC_TMP'
        clusterOptions = "-A $params.project ${params.clusterOptions ?: ''}"
        shell = ['/bin/bash', '-euo', 'pipefail']
        cpus = 8

}

test_data.conf

params {
        reads = '$HOME/genome_assembly_pipeline/genome_assembly_pipeline/test_data/PhiX_reads_{1,2}.fastq.gz' 
        draft_assemblies = '$HOME/genome_assembly_pipeline/genome_assembly_pipeline/test_data/PhiX_spades_assembly.fasta'
        genome_size = 5500
        insert_size = 500
        reference = '$HOME/genome_assembly_pipeline/genome_assembly_pipeline/test_data/Illumina_PhiX_reference.fasta.gz'
        // Check the busco lineage sets and locations in the appropriate cluster settings file. 
        busco_lineages = "$params.busco_lineages_home/{bacteria,eukaryota}_odb9"
}

I tried making a simpler toy case as follows, but I cannot recreate the issue as it occurs in the config above.

profiles {

        prof1 {
                params {
                        project = 'projectA'
                }
        }

        prof2 {
                params {
                        dataset_home = '/path/to/someDir'
                        busco_lineages_home = '/sw/apps/bioinfo/BUSCO/v3_lineage_sets'
                }
                env {
                        NXF_LAUNCHER="$SNIC_TMP"
                        NXF_TEMP="$SNIC_TMP"
                }
                process {
                        clusterOptions = "-A $params.project"
                }
        }

        prof3 {
                params {
                        datasets = "$params.dataset_home/{dataset1,dataset5}"
                        // Check the busco lineage sets and locations in the appropriate cluster settings file. 
                        busco_lineages = "$params.busco_lineages_home/{bacteria,eukaryota}_odb9"
                }
        }
}

In this example, the strings expand correctly:

$ nextflow config -profile prof1,prof2,prof3
params {
   project = 'projectA'
   dataset_home = '/path/to/someDir'
   busco_lineages_home = '/sw/apps/bioinfo/BUSCO/v3_lineage_sets'
   datasets = '/path/to/someDir/{dataset1,dataset5}'
   reads = '$HOME/genome_assembly_pipeline/genome_assembly_pipeline/test_data/PhiX_reads_{1,2}.fastq.gz'
   draft_assemblies = '$HOME/genome_assembly_pipeline/genome_assembly_pipeline/test_data/PhiX_spades_assembly.fasta'
   genome_size = 5500
   insert_size = 500
   reference = '$HOME/genome_assembly_pipeline/genome_assembly_pipeline/test_data/Illumina_PhiX_reference.fasta.gz'
   busco_lineages = '/sw/apps/bioinfo/BUSCO/v3_lineage_sets/{bacteria,eukaryota}_odb9'
   modules = ['bwa', 'blobtools']
}

env {
   NXF_LAUNCHER = '/scratch'
   NXF_TEMP = '/scratch'
}

process {
   clusterOptions = '-A projectA'
}
mahesh-panchal commented 6 years ago

Argh, I just understood why this error occurs from writing this.

The test profile is written in the config file before the other two, creating the error. Moving it to after, fixes the issue and the strings are correctly resolved.

Corrected config file:

profiles {

        // default profile
        standard {
                process.executor = 'local'
        }

        // Cluster settings: Usage - nextflow run workflow.nf -profile cluster,rackham,test_data

        cluster {
                params {
                        // Workflow settings.
                        project = 'snic2018-8-34'
                        clusterOptions = '-t 1:00:00'
                        output_dir = './NBIS-results'
                }
                timeline {
                        enabled = true
                        file = "${params.output_dir}/pipeline_info/NBIS-workflow_timeline.html"
                }
                report {
                        enabled = true
                        file = "${params.output_dir}/pipeline_info/NBIS-workflow_report.html"
                }
                trace {
                        enabled = true
                        file = "${params.output_dir}/pipeline_info/NBIS-workflow_trace.txt"
                }
                dag {
                        enabled = true
                        file = "${params.output_dir}/pipeline_info/NBIS-workflow_DAG.svg"
                }
        }

        rackham {
                includeConfig 'cluster_rackham.conf'
        }

        // Project data settings
        test_data {
                includeConfig 'test_data.conf'
        }

}