nf-core / oncoanalyser

A comprehensive cancer DNA/RNA analysis and reporting pipeline
MIT License
39 stars 6 forks source link

symlink specific gridss index files #54

Closed casslitch closed 3 months ago

casslitch commented 3 months ago

genome_gridss_index folder may contain additional files that are staged by nextflow if they are part of the inputs (e.g. someone might have the .dict file in this folder if they created it via gridss.PrepareReference). This causes an error when we try to symlink the entire contents of the folder. Suggestion: symlink only specific files of interest.

github-actions[bot] commented 3 months ago

nf-core lint overall result: Passed :white_check_mark: :warning:

Posted for pipeline commit d0a920f

+| ✅ 177 tests passed       |+
#| ❔   5 tests were ignored |#
!| ❗  15 tests had warnings |!
### :heavy_exclamation_mark: Test warnings: * [files_exist]( - File not found: `assets/multiqc_config.yml` * [nextflow_config]( - Config ``manifest.version`` should end in ``dev``: ``1.0.0`` * [readme]( - README contains the placeholder `zenodo.XXXXXXX`. This should be replaced with the zenodo doi (after the first release). * [pipeline_todos]( - TODO string in ``: _Optionally add in-text citation tools to this list._ * [pipeline_todos]( - TODO string in ``: _Optionally add bibliographic entries to this list._ * [pipeline_todos]( - TODO string in ``: _Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!_ * [pipeline_todos]( - TODO string in `methods_description_template.yml`: _#Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline_ * [pipeline_todos]( - TODO string in `awsfulltest.yml`: _You can customise AWS full pipeline tests as required_ * [schema_params]( - Schema param `panel` not found from nextflow config * [schema_params]( - Schema param `genome_version` not found from nextflow config * [schema_params]( - Schema param `genome_type` not found from nextflow config * [schema_params]( - Schema param `ref_data_hmf_data_path` not found from nextflow config * [schema_params]( - Schema param `ref_data_panel_data_path` not found from nextflow config * [schema_params]( - Schema param `ref_data_virusbreakenddb_path` not found from nextflow config * [schema_params]( - Schema param `ref_data_hla_slice_bed` not found from nextflow config ### :grey_question: Tests ignored: * [files_exist]( - File is ignored: `lib/Utils.groovy` * [files_exist]( - File is ignored: `lib/WorkflowMain.groovy` * [files_exist]( - File is ignored: `lib/WorkflowOncoanalyser.groovy` * [actions_ci]( - actions_ci * [multiqc_config]( - multiqc_config ### :white_check_mark: Tests passed: * [files_exist]( - File found: `.gitattributes` * [files_exist]( - File found: `.gitignore` * [files_exist]( - File found: `.nf-core.yml` * [files_exist]( - File found: `.editorconfig` * [files_exist]( - File found: `.prettierignore` * [files_exist]( - File found: `.prettierrc.yml` * [files_exist]( - File found: `` * [files_exist]( - File found: `` * [files_exist]( - File found: `` * [files_exist]( - File found: `LICENSE` or `` or `LICENCE` or `` * [files_exist]( - File found: `nextflow_schema.json` * [files_exist]( - File found: `nextflow.config` * [files_exist]( - File found: `` * [files_exist]( - File found: `.github/.dockstore.yml` * [files_exist]( - File found: `.github/` * [files_exist]( - File found: `.github/ISSUE_TEMPLATE/bug_report.yml` * [files_exist]( - File found: `.github/ISSUE_TEMPLATE/config.yml` * [files_exist]( - File found: `.github/ISSUE_TEMPLATE/feature_request.yml` * [files_exist]( - File found: `.github/` * [files_exist]( - File found: `.github/workflows/branch.yml` * [files_exist]( - File found: `.github/workflows/ci.yml` * [files_exist]( - File found: `.github/workflows/linting_comment.yml` * [files_exist]( - File found: `.github/workflows/linting.yml` * [files_exist]( - File found: `assets/email_template.html` * [files_exist]( - File found: `assets/email_template.txt` * [files_exist]( - File found: `assets/sendmail_template.txt` * [files_exist]( - File found: `assets/nf-core-oncoanalyser_logo_light.png` * [files_exist]( - File found: `conf/modules.config` * [files_exist]( - File found: `conf/test.config` * [files_exist]( - File found: `conf/test_full.config` * [files_exist]( - File found: `docs/images/nf-core-oncoanalyser_logo_light.png` * [files_exist]( - File found: `docs/images/nf-core-oncoanalyser_logo_dark.png` * [files_exist]( - File found: `docs/` * [files_exist]( - File found: `docs/` * [files_exist]( - File found: `docs/` * [files_exist]( - File found: `docs/` * [files_exist]( - File found: `` * [files_exist]( - File found: `conf/base.config` * [files_exist]( - File found: `conf/igenomes.config` * [files_exist]( - File found: `.github/workflows/awstest.yml` * [files_exist]( - File found: `.github/workflows/awsfulltest.yml` * [files_exist]( - File found: `modules.json` * [files_exist]( - File not found check: `.github/ISSUE_TEMPLATE/` * [files_exist]( - File not found check: `.github/ISSUE_TEMPLATE/` * [files_exist]( - File not found check: `.github/workflows/push_dockerhub.yml` * [files_exist]( - File not found check: `.markdownlint.yml` * [files_exist]( - File not found check: `.nf-core.yaml` * [files_exist]( - File not found check: `.yamllint.yml` * [files_exist]( - File not found check: `bin/markdown_to_html.r` * [files_exist]( - File not found check: `conf/aws.config` * [files_exist]( - File not found check: `docs/images/nf-core-oncoanalyser_logo.png` * [files_exist]( - File not found check: `lib/Checks.groovy` * [files_exist]( - File not found check: `lib/Completion.groovy` * [files_exist]( - File not found check: `lib/NfcoreTemplate.groovy` * [files_exist]( - File not found check: `lib/Workflow.groovy` * [files_exist]( - File not found check: `parameters.settings.json` * [files_exist]( - File not found check: `pipeline_template.yml` * [files_exist]( - File not found check: `Singularity` * [files_exist]( - File not found check: `lib/nfcore_external_java_deps.jar` * [files_exist]( - File not found check: `.travis.yml` * [nextflow_config]( - Config variable found: `` * [nextflow_config]( - Config variable found: `manifest.nextflowVersion` * [nextflow_config]( - Config variable found: `manifest.description` * [nextflow_config]( - Config variable found: `manifest.version` * [nextflow_config]( - Config variable found: `manifest.homePage` * [nextflow_config]( - Config variable found: `timeline.enabled` * [nextflow_config]( - Config variable found: `trace.enabled` * [nextflow_config]( - Config variable found: `report.enabled` * [nextflow_config]( - Config variable found: `dag.enabled` * [nextflow_config]( - Config variable found: `process.cpus` * [nextflow_config]( - Config variable found: `process.memory` * [nextflow_config]( - Config variable found: `process.time` * [nextflow_config]( - Config variable found: `params.outdir` * [nextflow_config]( - Config variable found: `params.input` * [nextflow_config]( - Config variable found: `params.validationShowHiddenParams` * [nextflow_config]( - Config variable found: `params.validationSchemaIgnoreParams` * [nextflow_config]( - Config variable found: `manifest.mainScript` * [nextflow_config]( - Config variable found: `timeline.file` * [nextflow_config]( - Config variable found: `trace.file` * [nextflow_config]( - Config variable found: `report.file` * [nextflow_config]( - Config variable found: `dag.file` * [nextflow_config]( - Config variable (correctly) not found: `params.nf_required_version` * [nextflow_config]( - Config variable (correctly) not found: `params.container` * [nextflow_config]( - Config variable (correctly) not found: `params.singleEnd` * [nextflow_config]( - Config variable (correctly) not found: `params.igenomesIgnore` * [nextflow_config]( - Config variable (correctly) not found: `` * [nextflow_config]( - Config variable (correctly) not found: `params.enable_conda` * [nextflow_config]( - Config ``timeline.enabled`` had correct value: ``true`` * [nextflow_config]( - Config ``report.enabled`` had correct value: ``true`` * [nextflow_config]( - Config ``trace.enabled`` had correct value: ``true`` * [nextflow_config]( - Config ``dag.enabled`` had correct value: ``true`` * [nextflow_config]( - Config ```` began with ``nf-core/`` * [nextflow_config]( - Config variable ``manifest.homePage`` began with * [nextflow_config]( - Config ``dag.file`` ended with ``.html`` * [nextflow_config]( - Config variable ``manifest.nextflowVersion`` started with >= or !>= * [nextflow_config]( - Config `params.custom_config_version` is set to `master` * [nextflow_config]( - Config `params.custom_config_base` is set to `` * [nextflow_config]( - Lines for loading custom profiles found * [nextflow_config]( - nextflow.config contains configuration profile `test` * [nextflow_config]( - Config default value correct: params.force_genome= false * [nextflow_config]( - Config default value correct: params.prepare_reference_only= false * [nextflow_config]( - Config default value correct: params.create_stub_placeholders= false * [nextflow_config]( - Config default value correct: params.isofox_functions= TRANSCRIPT_COUNTS;ALT_SPLICE_JUNCTIONS;FUSIONS;RETAINED_INTRONS * [nextflow_config]( - Config default value correct: params.custom_config_version= master * [nextflow_config]( - Config default value correct: params.custom_config_base= * [nextflow_config]( - Config default value correct: params.max_cpus= 16 * [nextflow_config]( - Config default value correct: params.max_memory= 128.GB * [nextflow_config]( - Config default value correct: params.max_time= 240.h * [nextflow_config]( - Config default value correct: params.publish_dir_mode= copy * [nextflow_config]( - Config default value correct: params.validate_params= true * [files_unchanged]( - `.gitattributes` matches the template * [files_unchanged]( - `.prettierrc.yml` matches the template * [files_unchanged]( - `` matches the template * [files_unchanged]( - `LICENSE` matches the template * [files_unchanged]( - `.github/.dockstore.yml` matches the template * [files_unchanged]( - `.github/` matches the template * [files_unchanged]( - `.github/ISSUE_TEMPLATE/bug_report.yml` matches the template * [files_unchanged]( - `.github/ISSUE_TEMPLATE/config.yml` matches the template * [files_unchanged]( - `.github/ISSUE_TEMPLATE/feature_request.yml` matches the template * [files_unchanged]( - `.github/` matches the template * [files_unchanged]( - `.github/workflows/branch.yml` matches the template * [files_unchanged]( - `.github/workflows/linting_comment.yml` matches the template * [files_unchanged]( - `.github/workflows/linting.yml` matches the template * [files_unchanged]( - `assets/email_template.html` matches the template * [files_unchanged]( - `assets/email_template.txt` matches the template * [files_unchanged]( - `assets/sendmail_template.txt` matches the template * [files_unchanged]( - `assets/nf-core-oncoanalyser_logo_light.png` matches the template * [files_unchanged]( - `docs/images/nf-core-oncoanalyser_logo_light.png` matches the template * [files_unchanged]( - `docs/images/nf-core-oncoanalyser_logo_dark.png` matches the template * [files_unchanged]( - `docs/` matches the template * [files_unchanged]( - `.gitignore` matches the template * [files_unchanged]( - `.prettierignore` matches the template * [actions_awstest]( - '.github/workflows/awstest.yml' is triggered correctly * [actions_awsfulltest]( - `.github/workflows/awsfulltest.yml` is triggered correctly * [actions_awsfulltest]( - `.github/workflows/awsfulltest.yml` does not use `-profile test` * [readme]( - README Nextflow minimum version badge matched config. Badge: `22.10.5`, Config: `22.10.5` * [pipeline_name_conventions]( - Name adheres to nf-core convention * [template_strings]( - Did not find any Jinja template strings (255 files) * [schema_lint]( - Schema lint passed * [schema_lint]( - Schema title + description lint passed * [schema_lint]( - Input mimetype lint passed: 'text/csv' * [system_exit]( - No `System.exit` calls found * [actions_schema_validation]( - Workflow validation passed: branch.yml * [actions_schema_validation]( - Workflow validation passed: ci.yml * [actions_schema_validation]( - Workflow validation passed: awsfulltest.yml * [actions_schema_validation]( - Workflow validation passed: fix-linting.yml * [actions_schema_validation]( - Workflow validation passed: linting.yml * [actions_schema_validation]( - Workflow validation passed: download_pipeline.yml * [actions_schema_validation]( - Workflow validation passed: release-announcements.yml * [actions_schema_validation]( - Workflow validation passed: clean-up.yml * [actions_schema_validation]( - Workflow validation passed: awstest.yml * [actions_schema_validation]( - Workflow validation passed: linting_comment.yml * [merge_markers]( - No merge markers found in pipeline files * [modules_json]( - Only installed modules found in `modules.json` * [modules_structure]( - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL' * [base_config]( - `conf/base.config` found and not ignored. * [modules_config]( - `conf/modules.config` found and not ignored. * [modules_config]( - `WRITE_REFERENCE_DATA` found in `conf/modules.config` and Nextflow scripts. * [modules_config]( - `STAR_GENOMEGENERATE` found in `conf/modules.config` and Nextflow scripts. * [modules_config]( - `GATK4_MARKDUPLICATES` found in `conf/modules.config` and Nextflow scripts. * [modules_config]( - `MARKDUPS` found in `conf/modules.config` and Nextflow scripts. * [modules_config]( - `AMBER` found in `conf/modules.config` and Nextflow scripts. * [modules_config]( - `COBALT` found in `conf/modules.config` and Nextflow scripts. * [modules_config]( - `PURPLE` found in `conf/modules.config` and Nextflow scripts. * [modules_config]( - `BAMTOOLS` found in `conf/modules.config` and Nextflow scripts. * [modules_config]( - `CHORD` found in `conf/modules.config` and Nextflow scripts. * [modules_config]( - `EXTRACTCONTIG` found in `conf/modules.config` and Nextflow scripts. * [modules_config]( - `LILAC` found in `conf/modules.config` and Nextflow scripts. * [modules_config]( - `SIGS` found in `conf/modules.config` and Nextflow scripts. * [modules_config]( - `VIRUSBREAKEND` found in `conf/modules.config` and Nextflow scripts. * [modules_config]( - `VIRUSINTERPRETER` found in `conf/modules.config` and Nextflow scripts. * [modules_config]( - `ISOFOX` found in `conf/modules.config` and Nextflow scripts. * [modules_config]( - `CUPPA` found in `conf/modules.config` and Nextflow scripts. * [modules_config]( - `SAMTOOLS_FLAGSTAT` found in `conf/modules.config` and Nextflow scripts. * [modules_config]( - `ORANGE` found in `conf/modules.config` and Nextflow scripts. * [nfcore_yml]( - Repository type in `.nf-core.yml` is valid: `pipeline` * [nfcore_yml]( - nf-core version in `.nf-core.yml` is set to the latest version: `2.14.1` ### Run details * nf-core/tools version 2.14.1 * Run at `2024-06-05 14:24:46`
scwatts commented 3 months ago

Thanks for opening the PR! I don't think this affects users where the GRIDSS index was prepared with the builtin functionality but understand the current set up may have issues for externally created GRIDSS indexes that contain clashing file names with other staged files as you've pointed out.

I'm not sure what the best approach is here - only support precisely the expected GRIDSS index fileset or support all/common deviations from these expectations?

Leaning towards the former but will accommodate in the case that you feel other users may experience the same issue. If that's the case, I'd suggest one of two approaches:

While the former requires strong alignment with expected files I'd probably prefer it since I find it safer than forceful replacement. And if making this change it would be good to apply to all instances where the GRIDSS index is used for consistency.

casslitch commented 3 months ago

Hi Stephen,

Thanks for looking into this! I agree, this won't affect users who've generated the index from the built-in functionality. My thinking was that other users might be affected if they've pre-generated the gridss index themselves.

I like the first option of explicitly including all index files.

However, based on the current module, the genome_bwa_index folder isn't staged. Therefore only the files in the genome_gridss_index folder will be picked up (I think that's just the img and gridsscache files). Are the bwa index files required for this module? If so, I think 'path genome_bwa_index' needs to be added the inputs.

scwatts commented 3 months ago

Are the bwa index files required for this module? If so, I think 'path genome_bwa_index' needs to be added the inputs.

Ah, the reference genome indexes have been rearranged recently and the new GRIDSS index directory contains the following files:

Current GRIDSS index directory for GRCh38_hmf (click to show) ```text GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.amb GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.ann GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.bwt GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gridsscache GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.img GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.pac ```

Sorry for the confusion!

My thinking was that other users might be affected if they've pre-generated the gridss index themselves. I like the first option of explicitly including all index files.

Okay, let's make this change then. Can you adjust each of the symlink/find commands for the GRIDSS index directory on your PR branch according to the first option above and also change the merge PR base branch to dev? Once merged, I'll cherry-pick your commits into features-block-one-preview.

casslitch commented 3 months ago

Thanks Stephen, sounds good! Thanks for describing the new structure. Just to clarify, should I still add genome_bwa_index to the inputs for this module? For example, if the user has pre-generated the bwa index files and they sit in a different folder to genome_gridss_index, then they won't be staged. I agree this won't affect users who are using the downloaded genome_gridss_index folder which contains all the necessary files.

scwatts commented 3 months ago

No, the user will need to place all the BWA index files under the same GRIDSS index directory themselves. I think this is okay since (1) it is required to create the GRIDSS index files anyway, and (2) the BWA index files are not used anywhere else in the workflow.

Once the changes have been made, let's review and test!

casslitch commented 3 months ago

Thanks Stephen for explaining, agree with that logic!

scwatts commented 3 months ago

I'm going to rebase your commits on top of dev and then force push - you may need to delete your local branch and fetch it again.

I noticed I made a typo in the find command for 'bwa', that should be 'bwt' instead. If you can update that I'll test these changes tomorrow.

scwatts commented 3 months ago

The 'Run pipeline stubs' check is showing that the new back-slashes need to be escaped:

find -L ${genome_gridss_index} -regex '.*\\.\\(amb\\|ann\\|pac\\|gridsscache\\|sa\\|bwt\\|img\\|alt\\)'

scwatts commented 3 months ago

Okay, changes now look good.

I reproduced the issue with the dev branch using simulated WGS t/n data then successfully tested these changes for the GRCh38_hmf reference genome with both the default GRIDSS index directory and a 'custom' one that contains .{fai,dict} files.