zstephens / neat-genreads

NEAT read simulation tools
Other
92 stars 27 forks source link

Create tumor/normal pairs with CNVs #76

Open popicka opened 3 years ago

popicka commented 3 years ago

Hi, We are currently trying to use NEAT-genreads in order to generate realistic WGS/WES tumor and normal samples. genReadsTumorTutorial is very clear, and we were able to generate both somatic and germline SNPs, but we are not sure how to generate somatic CNVs in tumor sample.

We would like to perform benchmark of CNV callers. Here: https://github.com/zstephens/neat-genreads/issues/30 it is mentioned that the -vparameter should be used in order to include CNVs. Most of the CNV callers do not use VCF format, and report CNVs in BED format (most commonly like in the example below)

chr start   end length  copy_number
20  21655679    22029964    374286  3

What would be the recommended representation of CNVs?

Great tool!

Thank you, Ana

popicka commented 3 years ago

We have also tested neat with CNVs in VCF format like this:

20  29956380    .   N   <DUP>   .   .   IMPRECISE;SVTYPE=DUP;END=32442249;SVLEN=2485869;FOLD_CHANGE=2.022472;FOLD_CHANGE_LOG=1.016120;PROBES=408    GT:GQ:CN:CNQ    0/1:0:5:408
20  32442749    .   N   <DUP>   .   .   IMPRECISE;SVTYPE=DUP;END=37663008;SVLEN=5220259;FOLD_CHANGE=1.349033;FOLD_CHANGE_LOG=0.431926;PROBES=772    GT:GQ:CN:CNQ    0/1:0:3:772
20  37667055    .   N   <DEL>   .   .   IMPRECISE;SVTYPE=DEL;END=62959382;SVLEN=-25292327;FOLD_CHANGE=0.812778;FOLD_CHANGE_LOG=-0.299067;PROBES=2121    GT:GQ   0/1:2121

However, golden VCF file was empty

zstephens commented 3 years ago

Greetings,

It has been on my todo list to facilitate different representations for input SVs, but at the moment only the standard REF/ALT format is supported. So any SV needs to be boiled down to its constituent insertions/deletions.

E.g. if you wanted to have a large duplication it would have to be formatted: chr1 1000000 A ACGTACGTACGT... where CGT... is explicitly the duplicated sequence. It's kind of a pain, I admit, but I haven't yet worked up the courage to tackle all the different <> cases yet.

-Zach

popicka commented 3 years ago

Thank you so much!