Closed muppetjones closed 8 years ago
If -M 0 and no input variants are present, then it would be expected that you get an empty VCF. In the second example where an input VCF is specified the output golden VCF should contain all the valid mutations that NEAT was able to insert into the data, so the fact that it is empty in this scenario is cause for concern.
Could you share the console output for the simulations? It would be useful to see if variants were being skipped, and for what reason. Alternatively, if you share a few lines from exac.vcf, I might be able to find some formatting/syntax that NEAT isn’t happy with.
-Zach
On Oct 13, 2016, at 12:30 PM, Stephen J Bush notifications@github.com<mailto:notifications@github.com> wrote:
The below commands create empty VCFs. The tumor run only had a vcf given for the last 6 runs.
I imagine this is because the normal run (a) had no VCF given and (b) no mutation rate (-M 0), while the tumor run (c) also had no mutation rate and (d) only included mutations present in the given VCF. It makes sense in this scenario that there is no need to create a VCF--is this correct / the expected behavior?
python /neat-genreads/genReads.py \ -R 126 -c 100 -M 0 --pe 300 30 \ -o /data/neat/vaf_15/normal \ -r /refs/ensembl/ensembl_build_75.fa \ -t /data/regions.bed \ --bam --vcf --job N 36
python /neat-genreads/genReads.py \ -R 126 --pe 300 30 -c 100 -M 0 \ -o /data/neat/vaf_15/tumor11 \ -t /data/regions.bed \ -r /refs/ensembl/ensembl_build_75.fa \ -v /refs/exac/exac.vcf \ --bam --vcf --job 1 36
Here's the directory output:
-rw-r--r-- 1 root root 821 Oct 13 12:24 tumor_golden.vcf.job01of36 -rw-r--r-- 1 root root 0 Oct 13 09:36 tumor_golden.vcf.job02of36 -rw-r--r-- 1 root root 0 Oct 13 09:30 tumor_golden.vcf.job03of36 -rw-r--r-- 1 root root 0 Oct 13 09:32 tumor_golden.vcf.job04of36 -rw-r--r-- 1 root root 0 Oct 13 09:29 tumor_golden.vcf.job05of36 -rw-r--r-- 1 root root 0 Oct 13 09:35 tumor_golden.vcf.job06of36 -rw-r--r-- 1 root root 0 Oct 13 09:32 tumor_golden.vcf.job07of36 -rw-r--r-- 1 root root 0 Oct 13 09:35 tumor_golden.vcf.job08of36 -rw-r--r-- 1 root root 0 Oct 13 09:29 tumor_golden.vcf.job09of36 -rw-r--r-- 1 root root 0 Oct 13 09:35 tumor_golden.vcf.job10of36 -rw-r--r-- 1 root root 0 Oct 13 09:34 tumor_golden.vcf.job11of36 -rw-r--r-- 1 root root 0 Oct 13 09:34 tumor_golden.vcf.job12of36 -rw-r--r-- 1 root root 0 Oct 13 09:35 tumor_golden.vcf.job13of36 -rw-r--r-- 1 root root 0 Oct 13 09:29 tumor_golden.vcf.job14of36 -rw-r--r-- 1 root root 0 Oct 13 09:33 tumor_golden.vcf.job15of36 -rw-r--r-- 1 root root 0 Oct 13 09:35 tumor_golden.vcf.job16of36 -rw-r--r-- 1 root root 0 Oct 13 09:29 tumor_golden.vcf.job17of36 -rw-r--r-- 1 root root 0 Oct 13 09:33 tumor_golden.vcf.job18of36 -rw-r--r-- 1 root root 0 Oct 13 09:30 tumor_golden.vcf.job19of36 -rw-r--r-- 1 root root 0 Oct 13 09:35 tumor_golden.vcf.job20of36 -rw-r--r-- 1 root root 0 Oct 13 09:36 tumor_golden.vcf.job21of36 -rw-r--r-- 1 root root 0 Oct 13 09:29 tumor_golden.vcf.job22of36 -rw-r--r-- 1 root root 0 Oct 13 09:30 tumor_golden.vcf.job23of36 -rw-r--r-- 1 root root 0 Oct 13 09:35 tumor_golden.vcf.job24of36 -rw-r--r-- 1 root root 0 Oct 13 09:34 tumor_golden.vcf.job25of36 -rw-r--r-- 1 root root 0 Oct 13 09:29 tumor_golden.vcf.job26of36 -rw-r--r-- 1 root root 0 Oct 13 09:35 tumor_golden.vcf.job27of36 -rw-r--r-- 1 root root 0 Oct 13 09:35 tumor_golden.vcf.job28of36 -rw-r--r-- 1 root root 0 Oct 13 09:29 tumor_golden.vcf.job29of36 -rw-r--r-- 1 root root 22125196019 Oct 13 15:06 tumor_merged_golden.bam -rw-r--r-- 1 root root 3284 Oct 13 15:06 tumor_merged_golden.vcf
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_zstephens_neat-2Dgenreads_issues_8&d=DQMCaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=iSEBNMfCHfS5bLIS7AqI6wkxgRNv8-2dMwRF43ddddU&m=LtOjilgmz-2TuEG0-kYRV_sB3uxw9CqQu2u_c_a-6hg&s=ySh12GvYcM4B-FldXJNlWWeTpmlYn8e3zRuszxK3-lI&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AIHT1QC7oRsqbvKOxCBrmYH4UGts3hDCks5qzmq4gaJpZM4KWJ-2DJ&d=DQMCaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=iSEBNMfCHfS5bLIS7AqI6wkxgRNv8-2dMwRF43ddddU&m=LtOjilgmz-2TuEG0-kYRV_sB3uxw9CqQu2u_c_a-6hg&s=r_3x8yi0sc04QWz2qEk_rx8zItv0sh5TU2kVcg6W-qQ&e=.
Here is the log: neat.txt
And here are the first two lines from the exac file:
#CHROM POS ID REF ALT QUAL FILTER INFO
1 13372 . G C 608.91 PASS AC=3;AC_AFR=0;AC_AMR=0;AC_Adj=2;AC_EAS=0;AC_FIN=0;AC_Het=0;AC_Hom=1;AC_NFE=0;AC_OTH=0;AC_SAS=2;AF=6.998e-05;AN=42870;AN_AFR=770;AN_AMR=134;AN_Adj=8432;AN_EAS=254;AN_FIN=16;AN_NFE=2116;AN_OTH=90;AN_SAS=5052;BaseQRankSum=0.727;ClippingRankSum=1.15;DP=139843;FS=0.000;GQ_MEAN=12.48;GQ_STDDEV=15.18;Het_AFR=0;Het_AMR=0;Het_EAS=0;Het_FIN=0;Het_NFE=0;Het_OTH=0;Het_SAS=0;Hom_AFR=0;Hom_AMR=0;Hom_EAS=0;Hom_FIN=0;Hom_NFE=0;Hom_OTH=0;Hom_SAS=1;InbreedingCoeff=-0.0844;MQ=35.72;MQ0=0;MQRankSum=0.727;NCC=60853;QD=23.42;ReadPosRankSum=0.727;VQSLOD=-1.687e+00;culprit=MQ;DP_HIST=14728|2455|2120|518|121|499|534|314|111|21|10|2|2|0|0|0|0|0|0|0,1|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0;GQ_HIST=1012|14971|172|100|3161|259|127|30|8|9|5|16|1162|274|59|45|17|2|3|3,0|0|0|0|0|1|0|0|0|0|0|0|0|1|0|0|0|0|0|0;DOUBLETON_DIST=.;AC_MALE=2;AC_FEMALE=0;AN_MALE=5518;AN_FEMALE=2914;AC_CONSANGUINEOUS=0;AN_CONSANGUINEOUS=926;Hom_CONSANGUINEOUS=0;CSQ=C|non_coding_transcript_exon_variant&non_coding_transcript_variant|MODIFIER|DDX11L1|ENSG00000223972|Transcript|ENST00000456328|processed_transcript|3/3||ENST00000456328.2:n.620G>C||620||||||1||1|SNV|1|HGNC|37102|YES||||||||||||||||||||||||||||||||GGA|.,C|downstream_gene_variant|MODIFIER|WASH7P|ENSG00000227232|Transcript|ENST00000488147|unprocessed_pseudogene|||||||||||1|1032|-1|SNV|1|HGNC|38034|||||||||||||||||||||||||||||||||GGA|.,C|downstream_gene_variant|MODIFIER|WASH7P|ENSG00000227232|Transcript|ENST00000541675|unprocessed_pseudogene|||||||||||1|991|-1|SNV|1|HGNC|38034|||||||||||||||||||||||||||||||||GGA|.,C|splice_region_variant&non_coding_transcript_exon_variant&non_coding_transcript_variant|LOW|DDX11L1|ENSG00000223972|Transcript|ENST00000450305|transcribed_unprocessed_pseudogene|5/6||ENST00000450305.2:n.412G>C||412||||||1||1|SNV|1|HGNC|37102|||||||||||||||||||||||||||||||||GGA|.,C|non_coding_transcript_exon_variant&non_coding_transcript_variant|MODIFIER|DDX11L1|ENSG00000223972|Transcript|ENST00000515242|transcribed_unprocessed_pseudogene|3/3||ENST00000515242.2:n.613G>C||613||||||1||1|SNV|1|HGNC|37102|||||||||||||||||||||||||||||||||GGA|.,C|downstream_gene_variant|MODIFIER|WASH7P|ENSG00000227232|Transcript|ENST00000538476|unprocessed_pseudogene|||||||||||1|1039|-1|SNV|1|HGNC|38034|||||||||||||||||||||||||||||||||GGA|.,C|intron_variant&non_coding_transcript_variant|MODIFIER|DDX11L1|ENSG00000223972|Transcript|ENST00000518655|transcribed_unprocessed_pseudogene||2/3|ENST00000518655.2:n.482-31G>C||||||||1||1|SNV|1|HGNC|37102|||||||||||||||||||||||||||||||||GGA|.,C|downstream_gene_variant|MODIFIER|WASH7P|ENSG00000227232|Transcript|ENST00000438504|unprocessed_pseudogene|||||||||||1|991|-1|SNV|1|HGNC|38034|YES||||||||||||||||||||||||||||||||GGA|.,C|downstream_gene_variant|MODIFIER|WASH7P|ENSG00000227232|Transcript|ENST00000423562|unprocessed_pseudogene|||||||||||1|991|-1|SNV|1|HGNC|38034|||||||||||||||||||||||||||||||||GGA|.,C|regulatory_region_variant|MODIFIER|||RegulatoryFeature|ENSR00001576075|CTCF_binding_site|||||||||||1|||SNV|1|||||||||||||||||||||||||||||||||||GGA|.;AC_POPMAX=2;AN_POPMAX=5052;POPMAX=SAS;K1_RUN=G:0;K2_RUN=GA:0;K3_RUN=GAA:0;ESP_AF_POPMAX=0;ESP_AF_GLOBAL=0;ESP_AC=0;KG_AF_POPMAX=0;KG_AF_GLOBAL=0;KG_AC=0
1 13380 . C G 7829.15 VQSRTrancheSNP99.60to99.80 AC=41;AC_AFR=22;AC_AMR=1;AC_Adj=23;AC_EAS=0;AC_FIN=0;AC_Het=23;AC_Hom=0;AC_NFE=0;AC_OTH=0;AC_SAS=0;AF=1.209e-03;AN=33902;AN_AFR=578;AN_AMR=104;AN_Adj=6922;AN_EAS=192;AN_FIN=8;AN_NFE=1684;AN_OTH=74;AN_SAS=4282;BaseQRankSum=0.018;ClippingRankSum=0.264;DP=106203;FS=4.314;GQ_MEAN=12.26;GQ_STDDEV=20.80;Het_AFR=22;Het_AMR=1;Het_EAS=0;Het_FIN=0;Het_NFE=0;Het_OTH=0;Het_SAS=0;Hom_AFR=0;Hom_AMR=0;Hom_EAS=0;Hom_FIN=0;Hom_NFE=0;Hom_OTH=0;Hom_SAS=0;InbreedingCoeff=-0.0794;MQ=31.93;MQ0=0;MQRankSum=0.00;NCC=67643;QD=8.10;ReadPosRankSum=-1.580e-01;VQSLOD=-3.041e+00;culprit=MQ;DP_HIST=11485|1978|1863|671|104|288|337|152|58|11|2|2|0|0|0|0|0|0|0|0,3|11|7|6|5|2|1|1|0|0|0|1|0|0|0|0|0|0|0|0;GQ_HIST=1048|11415|159|57|2884|298|140|17|3|5|1|4|570|210|55|45|16|3|4|17,0|2|1|0|2|0|0|0|0|4|0|4|2|0|1|1|2|1|1|16;DOUBLETON_DIST=.;AC_MALE=9;AC_FEMALE=14;AN_MALE=4640;AN_FEMALE=2282;AC_CONSANGUINEOUS=0;AN_CONSANGUINEOUS=798;Hom_CONSANGUINEOUS=0;CSQ=G|non_coding_transcript_exon_variant&non_coding_transcript_variant|MODIFIER|DDX11L1|ENSG00000223972|Transcript|ENST00000456328|processed_transcript|3/3||ENST00000456328.2:n.628C>G||628|||||rs571093408|1||1|SNV|1|HGNC|37102|YES|||||||||||G:0.0082|G:0.0303|G:0.0014||G:0|G:0|G:0|||||||||||||||ACT|.,G|downstream_gene_variant|MODIFIER|WASH7P|ENSG00000227232|Transcript|ENST00000488147|unprocessed_pseudogene||||||||||rs571093408|1|1024|-1|SNV|1|HGNC|38034||||||||||||G:0.0082|G:0.0303|G:0.0014||G:0|G:0|G:0|||||||||||||||ACT|.,G|downstream_gene_variant|MODIFIER|WASH7P|ENSG00000227232|Transcript|ENST00000541675|unprocessed_pseudogene||||||||||rs571093408|1|983|-1|SNV|1|HGNC|38034||||||||||||G:0.0082|G:0.0303|G:0.0014||G:0|G:0|G:0|||||||||||||||ACT|.,G|splice_region_variant&intron_variant&non_coding_transcript_variant|LOW|DDX11L1|ENSG00000223972|Transcript|ENST00000450305|transcribed_unprocessed_pseudogene||5/5|ENST00000450305.2:n.414+6C>G|||||||rs571093408|1||1|SNV|1|HGNC|37102||||||||||||G:0.0082|G:0.0303|G:0.0014||G:0|G:0|G:0|||||||||||||||ACT|.,G|non_coding_transcript_exon_variant&non_coding_transcript_variant|MODIFIER|DDX11L1|ENSG00000223972|Transcript|ENST00000515242|transcribed_unprocessed_pseudogene|3/3||ENST00000515242.2:n.621C>G||621|||||rs571093408|1||1|SNV|1|HGNC|37102||||||||||||G:0.0082|G:0.0303|G:0.0014||G:0|G:0|G:0|||||||||||||||ACT|.,G|downstream_gene_variant|MODIFIER|WASH7P|ENSG00000227232|Transcript|ENST00000538476|unprocessed_pseudogene||||||||||rs571093408|1|1031|-1|SNV|1|HGNC|38034||||||||||||G:0.0082|G:0.0303|G:0.0014||G:0|G:0|G:0|||||||||||||||ACT|.,G|intron_variant&non_coding_transcript_variant|MODIFIER|DDX11L1|ENSG00000223972|Transcript|ENST00000518655|transcribed_unprocessed_pseudogene||2/3|ENST00000518655.2:n.482-23C>G|||||||rs571093408|1||1|SNV|1|HGNC|37102||||||||||||G:0.0082|G:0.0303|G:0.0014||G:0|G:0|G:0|||||||||||||||ACT|.,G|downstream_gene_variant|MODIFIER|WASH7P|ENSG00000227232|Transcript|ENST00000438504|unprocessed_pseudogene||||||||||rs571093408|1|983|-1|SNV|1|HGNC|38034|YES|||||||||||G:0.0082|G:0.0303|G:0.0014||G:0|G:0|G:0|||||||||||||||ACT|.,G|downstream_gene_variant|MODIFIER|WASH7P|ENSG00000227232|Transcript|ENST00000423562|unprocessed_pseudogene||||||||||rs571093408|1|983|-1|SNV|1|HGNC|38034||||||||||||G:0.0082|G:0.0303|G:0.0014||G:0|G:0|G:0|||||||||||||||ACT|.,G|regulatory_region_variant|MODIFIER|||RegulatoryFeature|ENSR00001576075|CTCF_binding_site||||||||||rs571093408|1|||SNV|1||||||||||||||G:0.0082|G:0.0303|G:0.0014||G:0|G:0|G:0|||||||||||||||ACT|.;AC_POPMAX=22;AN_POPMAX=578;POPMAX=AFR;K1_RUN=C:0;K2_RUN=CT:0;K3_RUN=CTC:0;ESP_AF_POPMAX=0;ESP_AF_GLOBAL=0;ESP_AC=0;KG_AF_POPMAX=0.030300000682473183;KG_AF_GLOBAL=0.008186900988221169;KG_AC=41
Thanks for taking a look.
I think this is fixed with the latest version.
I've just had the same problem. I have given input variants and used the latest version.
Greetings!
Can you provide the command line you used, and a few lines from your input vcf file?
Thanks! Zach
Hi there! Thanks for the quick response - I am quite new to this field and imagine I am the cause of this problem.
var=shuf -i 100-150 -n 1
mean=shuf -i 180-220 -n 1
sd=shuf -i 15-20 -n 1
if [ ${ANC} == "EUR" ] then v=$[100 + (RANDOM % 100)]$[1000 + (RANDOM % 1000)] cov=$[RANDOM % (50 - 10 + 1) + 10 ].${v:1:2}${v:4:3} else v=$[100 + (RANDOM % 100)]$[1000 + (RANDOM % 1000)] cov=$[RANDOM % (85 - 60 + 1) + 60 ].${v:1:2}${v:4:3} fi
python ${genRead}genReads.py -r \ /mnt/lustre/groups/CBBI0818/REF/ucsc.hg19.fasta \ -R ${var} \ -v ${refvcf}${ANC}${CHR}.vcf \ -c ${cov} \ -o ${outsim}${ANC}/${ANC}_${CHR}_sim_reads_altcov \ --bam --vcf --pe ${mean} ${sd} -M 0
I have used multiple input vcf files, here is one of them:
1 672867 . G . . . PR GT 0/0 1 672961 . C . . . PR GT 0/0 1 674452 . T . . . PR GT 0/0 1 703942 . G . . . PR GT 0/0 1 706982 . C . . . PR GT 0/0 1 713377 . G . . . PR GT 0/0 1 716241 rs116632831 T . . . PR GT 0/0 1 718638 rs185487076 G . . . PR GT 0/0 1 719854 rs200512186 CAG . . . PR GT 0/0 1 719914 rs187772768 C . . . PR GT 0/0 1 729957 rs116982056 T . . . PR GT 0/0 1 731853 . G . . . PR GT 0/0 1 733350 . T . . . PR GT 0/0
Many Thanks for the help! Noelle
Greetings,
This is a rather strange VCF file, it looks at those it only contains reference alleles, and no alterations. GenReads would certainly discard these variants due to an absence of an ALT allele ("." in every row?) and the genotype field is "0/0".
For example, if the first variant were to be a SNP: G-->T, it could look like this:
1 672867 . G T . PASS . GT 1/1
With 1/1 indicating homozygous, 0/1 or 1/0 indicating heterozygous.
More information here: https://samtools.github.io/hts-specs/VCFv4.2.pdf
The below commands create empty VCFs. The tumor run only had a vcf given for the last 6 runs.
I imagine this is because the normal run (a) had no VCF given and (b) no mutation rate (
-M 0
), while the tumor run (c) also had no mutation rate and (d) only included mutations present in the given VCF. It makes sense in this scenario that there is no need to create a VCF--is this correct / the expected behavior?Here's the directory output: