Open Shaun-Regenbaum opened 6 months ago
I am going to do some more exploration of this, and hopefully submit a PR with a fix this week.
I have a working fork that I think I fixed the issue on. In short this issue would arise when the exome.bed file contained non standard or unplaced chromosomal sequences which can happen quite often in non human genomes, for example:
chrUn_GJ060129v1 3730 4217 chrUn_GJ060129v1 5192 5333 chrUn_GJ060129v1 5806 6353 chrUn_GJ060163v1 0 311 chrUn_GJ060163v1 741 1129
My fix was to add a workflow step that simply filters the exome.bed file by what chromosomes are defined by the genome.dict file. It shouldn't affect other pipelines and should just allow the pipeline to handle a greater variety of refrence genomes/species.
I love this idea, that's an amazing addition
Description of the bug
Then GATK4_BEDTOINTERVALLIST sometimes fails when using a variety of references genomes due to the incorrect creation of the genome.dict or exome.bed file from the reference GTF files. This results in a sequence dictionary mismatch between the two which leads the step to fail.
Command used and terminal output
No response
Relevant files
No response
System information
No response