nf-core / modules

Repository to host tool-specific module files for the Nextflow DSL2 community!
https://nf-co.re/modules
MIT License
283 stars 717 forks source link

[BUG] Empty VCF on testdata for mutect2 #870

Closed fbdtemme closed 1 year ago

fbdtemme commented 3 years ago

When running the mutect2 test for tumor-normal analysis the test runs fine, but an empty VCF file is generated. This causes issues downstream when trying the use the output of GATK4_MUTECT2 in for example GATK4_MERGEVCFS.

It would be great to have a test dataset that produces actual results so downstream tools can rely on that data as well.

Steps to reproduce:

export PROFILE=docker
nextflow run tests/modules/gatk4/mutect2 -entry test_gatk4_mutect2_tumor_normal_pair -c tests/config/nextflow.config

Inspect VCF file generated by GATK4_MUTECT2...

GCJMackenzie commented 3 years ago

Small update on progress here: was working on adding a mitochondria mode for mutect2 and on a hunch ran the TN test using the --mitochondira-mode argument (which increases sensitivity), when I did we got some variants in the vcf. This doesn't fix the issue, but I think it at least confirms the problem is that the test data lacks any significant TN variants as was discussed yesterday. So the good news is that the mutect2 module is working, bad news is we will need some different test data after all.

FriederikeHanssen commented 3 years ago

😭 thanks for the detective work. This is going to be a major pain. We essentially need test data then that is UMI tagged, preferably on chr22 (or ideally on chr6 for hlatyping tools, but that is even more work) and has enough variants. The UMI tagging thing might make simulating more difficult. I don't think the one I shared with you yesterday is doing that. Another option would be to keep the umi tagged reads as a separate entity and find/simulate a complete new set of reads covering the above constraints.

GCJMackenzie commented 3 years ago

@FriederikeHanssen are the recal bams in the Sarek test data directory the same as in the modules? Was thinking maybe I could try that and hope anything shows up if it is different.

FriederikeHanssen commented 3 years ago

no these are complete separate things. I can't remember now actually why we didn't choose the sarek test-data in the end and port it modules/test-data 🤔 . BUt yes to track down the issue, you could definitely try that

FriederikeHanssen commented 3 years ago

Possible modules with broken test data:

FriederikeHanssen commented 3 years ago

As discussed in gather.town, we will proceed by first adding all sarek/raredisease modules (as these two pipelines are probably most effected) and then update the test data. Otherwise we might continue running into the same problem over and over again having to also update all upstream modules. Please add modules that are effected to this issue. we started a collection above

FriederikeHanssen commented 2 years ago

This is fixed, right?

GCJMackenzie commented 2 years ago

I haven't had anything to do with the ASCAT or ControlFREEC modules, but all the others should be using the new datasets now. So should be generating files that aren't empty now

jasmezz commented 1 year ago

Hi there!

Looks like the issue is solved. Are you still planning to check if all module tests are working fine? If not, you can ignore this message and we’ll close your issue in about 2 weeks. If you think this is still relevant, you can also add it to the hackathon2023 project board.

Cheers the nf-core maintainers