Closed blajoie closed 2 years ago
I just tried the v3.2 tagged release (instead of master) and it indeed can produce HOM https://github.com/ncsa/NEAT/releases/tag/3.2
python NEAT-3.2/gen_reads.py -r genome.fa -R 150 -o neat-sim -c 35.0 -E 0.0 -M 0.0 --pe 350 70 -d --bam --vcf -p 2 --force-coverage -v truth.vcf
mpileup + call variants
$ samtools view -h neat-sim_golden.bam | bcftools mpileup --threads 1 --no-BAQ -Q 0 --ff UNMAP,SECONDARY,QCFAIL -Ov -f genome.fa -a 'AD,ADF,ADR,DP,SP,INFO/AD,INFO/ADF,INFO/ADR,FORMAT/SP' - | bcftools call --threads 1 -m -A | grep -v 0/0
phix 363 . T C 222 . DP=38;ADF=11,12;ADR=9,6;AD=20,18;VDB=0.712919;SGB=-0.691153;RPB=0.789777;MQB=1;MQSB=1;BQB=0.757384;MQ0F=0;ICB=1;HOB=0.5;AC=1;AN=2;DP4=11,9,12,6;MQ=60 GT:PL:DP:SP:ADF:ADR:AD 0/1:255,0,255:38:3:11,12:9,6:20,18
phix 466 . T A 225 . DP=38;ADF=0,17;ADR=0,21;AD=0,38;VDB=0.184623;SGB=-0.693143;MQSB=1;MQ0F=0;AC=2;AN=2;DP4=0,0,17,21;MQ=60 GT:PL:DP:SP:ADF:ADR:AD 1/1:255,114,0:38:0:0,17:0,21:0,38
phix 469 . T C 225 . DP=38;ADF=0,17;ADR=0,21;AD=0,38;VDB=0.150404;SGB=-0.693143;MQSB=1;MQ0F=0;AC=2;AN=2;DP4=0,0,17,21;MQ=60 GT:PL:DP:SP:ADF:ADR:AD 1/1:255,114,0:38:0:0,17:0,21:0,38
phix 513 . G A 225 . DP=40;ADF=0,17;ADR=0,23;AD=0,40;VDB=0.246379;SGB=-0.693145;MQSB=1;MQ0F=0;AC=2;AN=2;DP4=0,0,17,23;MQ=60 GT:PL:DP:SP:ADF:ADR:AD 1/1:255,120,0:40:0:0,17:0,23:0,40
phix 528 . G T 222 . DP=41;ADF=9,10;ADR=7,15;AD=16,25;VDB=0.662778;SGB=-0.692914;RPB=0.957706;MQB=1;MQSB=1;BQB=0.394977;MQ0F=0;ICB=1;HOB=0.5;AC=1;AN=2;DP4=9,7,10,15;MQ=60 GT:PL:DP:SP:ADF:ADR:AD 0/1:255,0,250:41:5:9,10:7,15:16,25
Okay, so it appears I need to merge the tagged version into master, in that case.
This is one area we are working on improving in version 4.0 as well. Let me know if version 3.2 is working as expected.
I found a small bug in master. I fixed that and now it outputs HOM variants when HOM variants were in the input vcf.
Great - thanks for that @joshfactorial !
Version
NEAT-genReads V3.2
Hi - Just recently starting using this tool, but noticed that the reads / vcf / bam only support HET calls even when using a input VCF file containing many genotypes. Ploidy is set to 2, so I assume two haplotypes are being constructed from which to sample reads from, but still we are only seeing HET support in the reads. Any ideas?
Steps to reproduce below.
truth_vcf
Running NEAT (phix as an example)
Followed by naive pileup + bcftools call to check GT status
As you can see, all 5 come through as 0/1 HET. Is it possible to simulate reads across either GT as defined in the VCF? Any thoughts what could be going wrong here?
Cheers