nf-core / eager

A fully reproducible and state-of-the-art ancient DNA analysis pipeline
MIT License
129 stars 78 forks source link

DSL2: genotyping #1016

Closed TCLamnidis closed 3 months ago

TCLamnidis commented 12 months ago



PR checklist

github-actions[bot] commented 11 months ago

nf-core lint overall result: Passed :white_check_mark: :warning:

Posted for pipeline commit d39a10a

+| ✅ 246 tests passed       |+
#| ❔   1 tests were ignored |#
!| ❗  22 tests had warnings |!
### :heavy_exclamation_mark: Test warnings: * [readme]( - README contains the placeholder `zenodo.XXXXXXX`. This should be replaced with the zenodo doi (after the first release). * [pipeline_todos]( - TODO string in ``: _Remove this line if you don't need a FASTA file_ * [pipeline_todos]( - TODO string in `nextflow.config`: _Specify your pipeline's command line flags_ * [pipeline_todos]( - TODO string in ``: _Include a figure that guides the user through the major workflow steps. Many nf-core_ * [pipeline_todos]( - TODO string in ``: _Fill in short bullet-pointed list of the default steps in the pipeline_ * [pipeline_todos]( - TODO string in `ci.yml`: _You can customise CI pipeline run tests as required_ * [pipeline_todos]( - TODO string in `awsfulltest.yml`: _You can customise AWS full pipeline tests as required_ * [pipeline_todos]( - TODO string in ``: _Optionally add in-text citation tools to this list._ * [pipeline_todos]( - TODO string in ``: _Optionally add bibliographic entries to this list._ * [pipeline_todos]( - TODO string in ``: _Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!_ * [pipeline_todos]( - TODO string in ``: _Add documentation about anything specific to running your pipeline. For general topics, please point to (and add to) the main nf-core website._ * [pipeline_todos]( - TODO string in `methods_description_template.yml`: _#Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline_ * [pipeline_todos]( - TODO string in `test_full.config`: _Specify the paths to your full test data ( on nf-core/test-datasets or directly in repositories, e.g. SRA)_ * [pipeline_todos]( - TODO string in `test_full.config`: _Give any required params for the test so that command line flags are not needed_ * [pipeline_todos]( - TODO string in `base.config`: _Check the defaults for all processes_ * [pipeline_todos]( - TODO string in `base.config`: _Customise requirements for specific processes._ * [pipeline_todos]( - TODO string in `test.config`: _Specify the paths to your test data on nf-core/test-datasets_ * [pipeline_todos]( - TODO string in `test.config`: _Give any required params for the test so that command line flags are not needed_ * [pipeline_todos]( - TODO string in `test_humanbam.config`: _Specify the paths to your test data on nf-core/test-datasets_ * [pipeline_todos]( - TODO string in `test_humanbam.config`: _Give any required params for the test so that command line flags are not needed_ * [schema_description]( - No description provided in schema for parameter: `skip_qualimap` * [schema_description]( - No description provided in schema for parameter: `skip_damagecalculation` ### :grey_question: Tests ignored: * [nextflow_config]( - Config default ignored: params.contamination_estimation_angsd_hapmap ### :white_check_mark: Tests passed: * [files_exist]( - File found: `.gitattributes` * [files_exist]( - File found: `.gitignore` * [files_exist]( - File found: `.nf-core.yml` * [files_exist]( - File found: `.editorconfig` * [files_exist]( - File found: `.prettierignore` * [files_exist]( - File found: `.prettierrc.yml` * [files_exist]( - File found: `` * [files_exist]( - File found: `` * [files_exist]( - File found: `` * [files_exist]( - File found: `LICENSE` or `` or `LICENCE` or `` * [files_exist]( - File found: `nextflow_schema.json` * [files_exist]( - File found: `nextflow.config` * [files_exist]( - File found: `` * [files_exist]( - File found: `.github/.dockstore.yml` * [files_exist]( - File found: `.github/` * [files_exist]( - File found: `.github/ISSUE_TEMPLATE/bug_report.yml` * [files_exist]( - File found: `.github/ISSUE_TEMPLATE/config.yml` * [files_exist]( - File found: `.github/ISSUE_TEMPLATE/feature_request.yml` * [files_exist]( - File found: `.github/` * [files_exist]( - File found: `.github/workflows/branch.yml` * [files_exist]( - File found: `.github/workflows/ci.yml` * [files_exist]( - File found: `.github/workflows/linting_comment.yml` * [files_exist]( - File found: `.github/workflows/linting.yml` * [files_exist]( - File found: `assets/email_template.html` * [files_exist]( - File found: `assets/email_template.txt` * [files_exist]( - File found: `assets/sendmail_template.txt` * [files_exist]( - File found: `assets/nf-core-eager_logo_light.png` * [files_exist]( - File found: `conf/modules.config` * [files_exist]( - File found: `conf/test.config` * [files_exist]( - File found: `conf/test_full.config` * [files_exist]( - File found: `docs/images/nf-core-eager_logo_light.png` * [files_exist]( - File found: `docs/images/nf-core-eager_logo_dark.png` * [files_exist]( - File found: `docs/` * [files_exist]( - File found: `docs/` * [files_exist]( - File found: `docs/` * [files_exist]( - File found: `docs/` * [files_exist]( - File found: `` * [files_exist]( - File found: `assets/multiqc_config.yml` * [files_exist]( - File found: `conf/base.config` * [files_exist]( - File found: `conf/igenomes.config` * [files_exist]( - File found: `.github/workflows/awstest.yml` * [files_exist]( - File found: `.github/workflows/awsfulltest.yml` * [files_exist]( - File found: `modules.json` * [files_exist]( - File found: `pyproject.toml` * [files_exist]( - File not found check: `Singularity` * [files_exist]( - File not found check: `parameters.settings.json` * [files_exist]( - File not found check: `pipeline_template.yml` * [files_exist]( - File not found check: `.nf-core.yaml` * [files_exist]( - File not found check: `bin/markdown_to_html.r` * [files_exist]( - File not found check: `conf/aws.config` * [files_exist]( - File not found check: `.github/workflows/push_dockerhub.yml` * [files_exist]( - File not found check: `.github/ISSUE_TEMPLATE/` * [files_exist]( - File not found check: `.github/ISSUE_TEMPLATE/` * [files_exist]( - File not found check: `docs/images/nf-core-eager_logo.png` * [files_exist]( - File not found check: `.markdownlint.yml` * [files_exist]( - File not found check: `.yamllint.yml` * [files_exist]( - File not found check: `lib/Checks.groovy` * [files_exist]( - File not found check: `lib/Completion.groovy` * [files_exist]( - File not found check: `lib/Workflow.groovy` * [files_exist]( - File not found check: `lib/Utils.groovy` * [files_exist]( - File not found check: `lib/WorkflowMain.groovy` * [files_exist]( - File not found check: `lib/NfcoreTemplate.groovy` * [files_exist]( - File not found check: `lib/WorkflowEager.groovy` * [files_exist]( - File not found check: `lib/nfcore_external_java_deps.jar` * [files_exist]( - File not found check: `.travis.yml` * [nextflow_config]( - Config variable found: `` * [nextflow_config]( - Config variable found: `manifest.nextflowVersion` * [nextflow_config]( - Config variable found: `manifest.description` * [nextflow_config]( - Config variable found: `manifest.version` * [nextflow_config]( - Config variable found: `manifest.homePage` * [nextflow_config]( - Config variable found: `timeline.enabled` * [nextflow_config]( - Config variable found: `trace.enabled` * [nextflow_config]( - Config variable found: `report.enabled` * [nextflow_config]( - Config variable found: `dag.enabled` * [nextflow_config]( - Config variable found: `process.cpus` * [nextflow_config]( - Config variable found: `process.memory` * [nextflow_config]( - Config variable found: `process.time` * [nextflow_config]( - Config variable found: `params.outdir` * [nextflow_config]( - Config variable found: `params.input` * [nextflow_config]( - Config variable found: `params.validationShowHiddenParams` * [nextflow_config]( - Config variable found: `params.validationSchemaIgnoreParams` * [nextflow_config]( - Config variable found: `manifest.mainScript` * [nextflow_config]( - Config variable found: `timeline.file` * [nextflow_config]( - Config variable found: `trace.file` * [nextflow_config]( - Config variable found: `report.file` * [nextflow_config]( - Config variable found: `dag.file` * [nextflow_config]( - Config variable (correctly) not found: `params.nf_required_version` * [nextflow_config]( - Config variable (correctly) not found: `params.container` * [nextflow_config]( - Config variable (correctly) not found: `params.singleEnd` * [nextflow_config]( - Config variable (correctly) not found: `params.igenomesIgnore` * [nextflow_config]( - Config variable (correctly) not found: `` * [nextflow_config]( - Config variable (correctly) not found: `params.enable_conda` * [nextflow_config]( - Config ``timeline.enabled`` had correct value: ``true`` * [nextflow_config]( - Config ``report.enabled`` had correct value: ``true`` * [nextflow_config]( - Config ``trace.enabled`` had correct value: ``true`` * [nextflow_config]( - Config ``dag.enabled`` had correct value: ``true`` * [nextflow_config]( - Config ```` began with ``nf-core/`` * [nextflow_config]( - Config variable ``manifest.homePage`` began with * [nextflow_config]( - Config ``dag.file`` ended with ``.html`` * [nextflow_config]( - Config variable ``manifest.nextflowVersion`` started with >= or !>= * [nextflow_config]( - Config ``manifest.version`` ends in ``dev``: ``3.0.0dev`` * [nextflow_config]( - Config `params.custom_config_version` is set to `master` * [nextflow_config]( - Config `params.custom_config_base` is set to `` * [nextflow_config]( - Lines for loading custom profiles found * [nextflow_config]( - nextflow.config contains configuration profile `test` * [nextflow_config]( - Config default value correct: params.igenomes_base= s3://ngi-igenomes/igenomes/ * [nextflow_config]( - Config default value correct: params.custom_config_version= master * [nextflow_config]( - Config default value correct: params.custom_config_base= * [nextflow_config]( - Config default value correct: params.max_cpus= 16 * [nextflow_config]( - Config default value correct: params.max_memory= 128.GB * [nextflow_config]( - Config default value correct: params.max_time= 240.h * [nextflow_config]( - Config default value correct: params.publish_dir_mode= copy * [nextflow_config]( - Config default value correct: params.max_multiqc_email_size= 25.MB * [nextflow_config]( - Config default value correct: params.validate_params= true * [nextflow_config]( - Config default value correct: params.sequencing_qc_tool= fastqc * [nextflow_config]( - Config default value correct: params.preprocessing_tool= fastp * [nextflow_config]( - Config default value correct: params.preprocessing_minlength= 25 * [nextflow_config]( - Config default value correct: params.preprocessing_trim5p= 0 * [nextflow_config]( - Config default value correct: params.preprocessing_trim3p= 0 * [nextflow_config]( - Config default value correct: params.preprocessing_fastp_complexityfilter_threshold= 10 * [nextflow_config]( - Config default value correct: params.preprocessing_adapterremoval_trimbasequalitymin= 20 * [nextflow_config]( - Config default value correct: params.preprocessing_adapterremoval_adapteroverlap= 1 * [nextflow_config]( - Config default value correct: params.preprocessing_adapterremoval_qualitymax= 41 * [nextflow_config]( - Config default value correct: params.fastq_shard_size= 1000000 * [nextflow_config]( - Config default value correct: params.mapping_tool= bwaaln * [nextflow_config]( - Config default value correct: params.mapping_bwaaln_n= 0.01 * [nextflow_config]( - Config default value correct: params.mapping_bwaaln_k= 2 * [nextflow_config]( - Config default value correct: params.mapping_bwaaln_l= 1024 * [nextflow_config]( - Config default value correct: params.mapping_bwaaln_o= 2 * [nextflow_config]( - Config default value correct: params.mapping_bwamem_k= 19 * [nextflow_config]( - Config default value correct: params.mapping_bwamem_r= 1.5 * [nextflow_config]( - Config default value correct: params.mapping_bowtie2_alignmode= local * [nextflow_config]( - Config default value correct: params.mapping_bowtie2_sensitivity= sensitive * [nextflow_config]( - Config default value correct: params.mapping_bowtie2_n= 0 * [nextflow_config]( - Config default value correct: params.mapping_bowtie2_l= 20 * [nextflow_config]( - Config default value correct: params.mapping_bowtie2_trim5= 0 * [nextflow_config]( - Config default value correct: params.mapping_bowtie2_trim3= 0 * [nextflow_config]( - Config default value correct: params.mapping_bowtie2_maxins= 500 * [nextflow_config]( - Config default value correct: params.bamfiltering_minreadlength= 0 * [nextflow_config]( - Config default value correct: params.bamfiltering_mappingquality= 0 * [nextflow_config]( - Config default value correct: params.bamfilter_genomicbamfilterflag= 4 * [nextflow_config]( - Config default value correct: params.metagenomicscreening_input= unmapped * [nextflow_config]( - Config default value correct: params.metagenomics_complexity_tool= bbduk * [nextflow_config]( - Config default value correct: params.metagenomics_complexity_entropy= 0.3 * [nextflow_config]( - Config default value correct: params.metagenomics_prinseq_mode= entropy * [nextflow_config]( - Config default value correct: params.metagenomics_prinseq_dustscore= 0.5 * [nextflow_config]( - Config default value correct: params.deduplication_tool= markduplicates * [nextflow_config]( - Config default value correct: params.damage_manipulation_rescale_seqlength= 12 * [nextflow_config]( - Config default value correct: params.damage_manipulation_rescale_length_5p= 0 * [nextflow_config]( - Config default value correct: params.damage_manipulation_rescale_length_3p= 0 * [nextflow_config]( - Config default value correct: params.damage_manipulation_pmdtools_threshold= 3 * [nextflow_config]( - Config default value correct: params.damage_manipulation_bamutils_trim_double_stranded_none_udg_left= 0 * [nextflow_config]( - Config default value correct: params.damage_manipulation_bamutils_trim_double_stranded_none_udg_right= 0 * [nextflow_config]( - Config default value correct: params.damage_manipulation_bamutils_trim_double_stranded_half_udg_left= 0 * [nextflow_config]( - Config default value correct: params.damage_manipulation_bamutils_trim_double_stranded_half_udg_right= 0 * [nextflow_config]( - Config default value correct: params.damage_manipulation_bamutils_trim_single_stranded_none_udg_left= 0 * [nextflow_config]( - Config default value correct: params.damage_manipulation_bamutils_trim_single_stranded_none_udg_right= 0 * [nextflow_config]( - Config default value correct: params.damage_manipulation_bamutils_trim_single_stranded_half_udg_left= 0 * [nextflow_config]( - Config default value correct: params.damage_manipulation_bamutils_trim_single_stranded_half_udg_right= 0 * [nextflow_config]( - Config default value correct: params.genotyping_reference_ploidy= 2 * [nextflow_config]( - Config default value correct: params.genotyping_pileupcaller_min_base_quality= 30 * [nextflow_config]( - Config default value correct: params.genotyping_pileupcaller_min_map_quality= 30 * [nextflow_config]( - Config default value correct: params.genotyping_pileupcaller_method= randomHaploid * [nextflow_config]( - Config default value correct: params.genotyping_pileupcaller_transitions_mode= AllSites * [nextflow_config]( - Config default value correct: params.genotyping_gatk_call_conf= 30 * [nextflow_config]( - Config default value correct: params.genotyping_gatk_ug_downsample= 250 * [nextflow_config]( - Config default value correct: params.genotyping_gatk_ug_out_mode= EMIT_VARIANTS_ONLY * [nextflow_config]( - Config default value correct: params.genotyping_gatk_ug_genotype_mode= SNP * [nextflow_config]( - Config default value correct: params.genotyping_gatk_ug_defaultbasequalities= -1 * [nextflow_config]( - Config default value correct: params.genotyping_gatk_hc_out_mode= EMIT_VARIANTS_ONLY * [nextflow_config]( - Config default value correct: params.genotyping_gatk_hc_emitrefconf= GVCF * [nextflow_config]( - Config default value correct: params.genotyping_freebayes_min_alternate_count= 1 * [nextflow_config]( - Config default value correct: params.genotyping_freebayes_skip_coverage= 0 * [nextflow_config]( - Config default value correct: params.mitochondrion_header= MT * [nextflow_config]( - Config default value correct: params.mapstats_preseq_mode= c_curve * [nextflow_config]( - Config default value correct: params.mapstats_preseq_stepsize= 1000 * [nextflow_config]( - Config default value correct: params.mapstats_preseq_terms= 100 * [nextflow_config]( - Config default value correct: params.mapstats_preseq_maxextrap= 10000000000 * [nextflow_config]( - Config default value correct: params.mapstats_preseq_bootstrap= 100 * [nextflow_config]( - Config default value correct: params.mapstats_preseq_cval= 0.95 * [nextflow_config]( - Config default value correct: params.damagecalculation_tool= damageprofiler * [nextflow_config]( - Config default value correct: params.damagecalculation_yaxis= 0.3 * [nextflow_config]( - Config default value correct: params.damagecalculation_xaxis= 25 * [nextflow_config]( - Config default value correct: params.damagecalculation_damageprofiler_length= 100 * [nextflow_config]( - Config default value correct: params.damagecalculation_mapdamage_downsample= 0 * [nextflow_config]( - Config default value correct: params.host_removal_mode= remove * [nextflow_config]( - Config default value correct: params.contamination_estimation_angsd_chrom_name= X * [nextflow_config]( - Config default value correct: params.contamination_estimation_angsd_range_from= 5000000 * [nextflow_config]( - Config default value correct: params.contamination_estimation_angsd_range_to= 154900000 * [nextflow_config]( - Config default value correct: params.contamination_estimation_angsd_mapq= 30 * [nextflow_config]( - Config default value correct: params.contamination_estimation_angsd_minq= 30 * [files_unchanged]( - `.gitattributes` matches the template * [files_unchanged]( - `.prettierrc.yml` matches the template * [files_unchanged]( - `` matches the template * [files_unchanged]( - `LICENSE` matches the template * [files_unchanged]( - `.github/.dockstore.yml` matches the template * [files_unchanged]( - `.github/` matches the template * [files_unchanged]( - `.github/ISSUE_TEMPLATE/bug_report.yml` matches the template * [files_unchanged]( - `.github/ISSUE_TEMPLATE/config.yml` matches the template * [files_unchanged]( - `.github/ISSUE_TEMPLATE/feature_request.yml` matches the template * [files_unchanged]( - `.github/` matches the template * [files_unchanged]( - `.github/workflows/branch.yml` matches the template * [files_unchanged]( - `.github/workflows/linting_comment.yml` matches the template * [files_unchanged]( - `.github/workflows/linting.yml` matches the template * [files_unchanged]( - `assets/email_template.html` matches the template * [files_unchanged]( - `assets/email_template.txt` matches the template * [files_unchanged]( - `assets/sendmail_template.txt` matches the template * [files_unchanged]( - `assets/nf-core-eager_logo_light.png` matches the template * [files_unchanged]( - `docs/images/nf-core-eager_logo_light.png` matches the template * [files_unchanged]( - `docs/images/nf-core-eager_logo_dark.png` matches the template * [files_unchanged]( - `docs/` matches the template * [files_unchanged]( - `.gitignore` matches the template * [files_unchanged]( - `.prettierignore` matches the template * [files_unchanged]( - `pyproject.toml` matches the template * [actions_ci]( - '.github/workflows/ci.yml' is triggered on expected events * [actions_ci]( - '.github/workflows/ci.yml' checks minimum NF version * [actions_awstest]( - '.github/workflows/awstest.yml' is triggered correctly * [actions_awsfulltest]( - `.github/workflows/awsfulltest.yml` is triggered correctly * [actions_awsfulltest]( - `.github/workflows/awsfulltest.yml` does not use `-profile test` * [readme]( - README Nextflow minimum version badge matched config. Badge: `23.04.0`, Config: `23.04.0` * [pipeline_name_conventions]( - Name adheres to nf-core convention * [template_strings]( - Did not find any Jinja template strings (287 files) * [schema_lint]( - Schema lint passed * [schema_lint]( - Schema title + description lint passed * [schema_lint]( - Input mimetype lint passed: 'text/csv' * [schema_params]( - Schema matched params returned from nextflow config * [system_exit]( - No `System.exit` calls found * [actions_schema_validation]( - Workflow validation passed: awstest.yml * [actions_schema_validation]( - Workflow validation passed: fix-linting.yml * [actions_schema_validation]( - Workflow validation passed: linting_comment.yml * [actions_schema_validation]( - Workflow validation passed: clean-up.yml * [actions_schema_validation]( - Workflow validation passed: branch.yml * [actions_schema_validation]( - Workflow validation passed: ci.yml * [actions_schema_validation]( - Workflow validation passed: release-announcements.yml * [actions_schema_validation]( - Workflow validation passed: awsfulltest.yml * [actions_schema_validation]( - Workflow validation passed: download_pipeline.yml * [actions_schema_validation]( - Workflow validation passed: linting.yml * [merge_markers]( - No merge markers found in pipeline files * [modules_json]( - Only installed modules found in `modules.json` * [multiqc_config]( - 'assets/multiqc_config.yml' contains `report_section_order` * [multiqc_config]( - 'assets/multiqc_config.yml' contains `export_plots` * [multiqc_config]( - 'assets/multiqc_config.yml' contains `report_comment` * [multiqc_config]( - 'assets/multiqc_config.yml' follows the ordering scheme of the minimally required plugins. * [multiqc_config]( - 'assets/multiqc_config.yml' contains a matching 'report_comment'. * [multiqc_config]( - 'assets/multiqc_config.yml' contains 'export_plots: true'. * [modules_structure]( - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL' ### Run details * nf-core/tools version 2.13.1 * Run at `2024-03-19 15:44:26`
TCLamnidis commented 11 months ago


TCLamnidis commented 7 months ago

Now adding pileupcaller. ANGSD calling still lacks a module.

TCLamnidis commented 6 months ago

Waiting on once that is merged, I can run manual tests for pileupcaller using multi-reference too (to check for .combine duplications) From limited multiref tests, it seems the mpileup somehow gets confused giving:

Command error:
  [mpileup] 3 samples in 4 input files
  samtools mpileup: error reading from input file

Will need to lookinto the specific bam files used as input and see what's what

TCLamnidis commented 5 months ago

Previous comment wrror was caused by lack of library merging, meaning multiple bams per sample were supplied. not an issue with genotyping, but lacking of previous step.

Added manual tests. All passed. (maybe check with ssdna also for pileupcaller. outstanding TODOs:

TCLamnidis commented 5 months ago

Added merging of eigenstrat genotype datasets per reference across strandedness. For some reason, the BAM input in -profile test_multiref causes samtools mpileup to create no output and instead throw an error. The BAM passes samtools quickcheck just fine, so I don't think it is inherently broken. More likely, the shortened reference not matching the header of the BAM is the issue.

TCLamnidis commented 5 months ago

Added genotypers to test commands, and pipeline errors when no snp/bed file provided but pileupcaller requested. Locally I get some weird errors about failing to index the input BAM from the test profile. Never seen that before and don't think I changed anything that would cause that error, so wondering if CI will reproduce that behaviour.

EDIT: It seems that resuming the run makes the samtools index process work fine. visible confusion

TCLamnidis commented 5 months ago

It seems the input BAM with Mammoth mtDNA reads has an outdated RG tag info (bad SM, no LB, wrong ID) as it is a very old eager output. I updated it and will try again once test-datasets is fixed.

TCLamnidis commented 4 months ago


TCLamnidis commented 4 months ago

GATK_UG module needs updating ( ✅ ) and that needs updating of mulled container ( ✅ )

TCLamnidis commented 4 months ago


TCLamnidis commented 4 months ago

This needs rereview. I did not address the Freebayed BED file issue. We will not implement it now (as it would be a new feature anyway), and only implement it if it is requested.

TCLamnidis commented 3 months ago
TCLamnidis commented 3 months ago

GATK4_HAPLOTYPECALLER now fails because the input BAM has a different sample name in its RG than produced by the MAP SWF.

Updating test-datasets to fix this.