nf-core / sarek

Analysis pipeline to detect germline or somatic variants (pre-processing, variant calling and annotation) from WGS / targeted sequencing
https://nf-co.re/sarek
MIT License
395 stars 402 forks source link

Improve CNVKit #863

Open FriederikeHanssen opened 1 year ago

FriederikeHanssen commented 1 year ago

Description of feature

CNVKit can take sex information: https://github.com/etal/cnvkit/blob/b218280edb266788ea8326596d4283183ce425ea/doc/sex.rst as well as SNP information to to infer b-allel frequency: https://cnvkit.readthedocs.io/en/stable/pipeline.html#snp-allele-frequencies.

Adding sex should be simple we track that already for the other CNA, the vcfs we could discuss how to best pass them on.

In addition, there are some recommendations for tumor-only analysis, such as removal of low coverage areas: https://cnvkit.readthedocs.io/en/stable/tumor.html

amizeranschi commented 1 year ago

In addition, may I also suggest the possibility of generating VCF files from CNVKit?

https://cnvkit.readthedocs.io/en/stable/importexport.html#vcf

snesic commented 1 year ago

Tumor-only mode: it would be great if you can add an option to use a pre-generated reference.. If I understand correctly, now it is hard-coded to an empty list and it will be generated within the sub-workflow. I would like to create it out of normal samples and use it in every run..

FriederikeHanssen commented 1 year ago

@snesic do you mean passing on the reference.cnn?