nf-core / test-datasets

Test data to be used for automated testing with the nf-core pipelines
https://nf-co.re
MIT License
92 stars 326 forks source link

Add paraphase test data #1213

Closed fellen31 closed 1 month ago

fellen31 commented 1 month ago

Paraphase is a caller for highly similar paralogous genes such as SMN1 and SMN2.

For that reason, the reference genome needs to match the hard-coded positions of genes, and reads has to align to that specific part of the genome.

I've grabbed chr22 and chr22_KI270734v1_random, hard masked all bases not relevant to the PRODH gene to N and compressed the file to limit the reference genome size as much as possible (~9 MB). For the program to output a VCF file enough coverage is needed, so I've included 35 aligned reads in the BAM file.