zstephens / neat-genreads

NEAT read simulation tools
Other
95 stars 27 forks source link

there is no AF information in golden_vcf #36

Closed lituan closed 6 years ago

lituan commented 6 years ago

Hi

I download latest neat from this site and try to simulate some reads containing variants from a vcf, but the output vcf contains no AF information, here is the comman I use

python neat-genreads-master/genReads.py -r human_g1k_v37.fasta -t test.bed -v test.vcf --bam --vcf -R 50 -o test --pe 300 30 -M 0.1 -p 10

afzm commented 6 years ago

I experiment the same problem, the info column only shows information about WP but not AF. I like this simulator so much, and If that was added I would not have to use another one.

zstephens commented 6 years ago

Because the random variants introduced by the simulator are in fact wholly random (i.e. not sourced from a database) it doesn't make sense to include population-level statistics like allele frequency. The WP field is meant to be a proxy to variant allele frequency (the expected proportion of reads at this site that you should expect to support the variant), and follows a similar format to standard GT values:

(assuming you're simulating diploid genome:) homozygous: WP=1/1 heterozygous: WP=0/1 or WP=1/0

It makes sense that if an input vcf file had variants with AF fields, then I could push those forward into the output vcf created by NEAT. Is that what you're asking for? If you have downstream tools that are failing due to the lack of an AF field I would be open to adding an option to output dummy values for the random variants introduced via the -M input option. (e.g. AF=0.0 for randomly generated mutations).

afzm commented 6 years ago

Not only one of the best simulators out there but also great support, thank you very much for the explanation. All clear now.