Open XavierMialhe opened 5 years ago
Greetings,
The read simulator was designed around drawing reads from a ground-truth set of alleles, such that it could be used for benchmarking and assessing variant phasing algorithms in addition to standard variant calling pipelines.
As a result, there doesn't quite exist a notion of arbitrary variant-allele frequency for this tool, all variation it works with must belong to one of the phased copies of the reference sequences that it's sampling reads from, under the hood. E.g. if you specified a ploidy of 2 everything would be (0, 0.5, 1.0), ploidy 3 would be (0, 0.33, 0.66, 1.0), etc.
See this post for some possible ways of creating somatic datasets with this tool: https://github.com/zstephens/neat-genreads/issues/55#issuecomment-461495062
Hope this helps!
Hello,
I would like create a dataset for benchmark somatic variant caller with your software. Thanks for this clear documentation and your well-designed tool ! I just don't fully understand the generation of mutation model from provided file.
I have already spike somatic mutations into a BAM with BAMSurgeon and got a VCF (linked at the end) with controlled VAF (between 1% and 30%). If I use this VCF as input of simulation, all variants turn to germline variant with a VAF close to 50% (check with IGV).
How should I proceed to spike in my tumor bam, the same somatic variants with the same frequency of my VCF?
truth_mark_sorted.zip