sbg / Mitty

Seven Bridges Genomics aligner/caller debugging and analysis tools
Apache License 2.0
13 stars 2 forks source link

Unusable variants present in VCF #1

Closed yassineS closed 7 years ago

yassineS commented 7 years ago

Hello,

I am trying to generate simulated human reads. I cut my VCF using mitty filter-variants, however, after indexing it mitty generate-reads complains about some variants within the VCF file without specifying why and which ones.

mitty -v2 generate-reads \
  ~/Projects/Refs/ucsc.hg19/ucsc.hg19.fasta \
   IN.vcf.gz \
   SAMPLE \
   IN.bed \
   Custom-model.pkl \
   20 \
   7 \
   >(gzip > r1.fq.gz) \
   lq.txt \
   --fastq2 >(gzip > r2.fq.gz) \
   --threads 2
ERROR:mitty.lib.vcfio:Unusable variants present in VCF. Please filter or refactor these.
ghost commented 7 years ago

Hi @yassineS ,

(Please see my next comment. I could process your file just fine. Thanks)

Sorry to hear you are having issues. I see there is a repeated variant in the VCF you linked

3       10521284        .       T       A       228     .       DP=36;VDB=0.177398;SGB=-0.693127;MQSB=1;MQ0F=0;AC=2;AN=2;DP4=0,0,16,17;MQ=37    GT:PL   1/1:255,99,0
3       10521284        .       T       A       228     .       DP=36;VDB=0.177398;SGB=-0.693127;MQSB=1;MQ0F=0;AC=2;AN=2;DP4=0,0,16,17;MQ=37    GT:PL   1/1:255,99,0

and that is what the read generator is flagging:

mitty -v4 generate-reads ~/Data/human_g1k_v37_decoy.fasta final.vcf.gz Ust_Ishim ~/Code/mitty3/examples/reads/human-male.bed 1kg-pcr-free.pkl 0.1 7 r1.fq r.lq --fastq2 r2.fq 
DEBUG:root:Found model 1kg-pcr-free.pkl in builtins
DEBUG:mitty.lib.vcfio:Contig: 1, ploidy: 2 (Assumed. Contig was empty)
DEBUG:mitty.lib.vcfio:Contig: 2, ploidy: 2 (Assumed. Contig was empty)
DEBUG:mitty.lib.vcfio:Contig: 3, ploidy: 2 
DEBUG:mitty.lib.vcfio:Illegal overlap 3:10521284 T -> ('A',) (previous variant ends at 10521284)
ERROR:mitty.lib.vcfio:Unusable variants present in VCF. Please filter or refactor these.

I will check why filter-variants lets this pass (It shouldn't - the code to detect such patterns is shared between filter-variants and generate-reads) - it's a bug.

However, you should be able to run by removing this duplicate line. Please tell me if that doesn't work.

Thanks! -Kaushik

ghost commented 7 years ago

Hi @yassineS

My previous response was incorrect, sorry. The tools work as expected. filter-variants does get rid of the illegal variants and the filtered vcf can be used to generate reads just fine:

$ mitty -v4 filter-variants final.vcf.gz Ust_Ishim ~/Code/mitty3/examples/reads/human-male.bed final-filtered.vcf

<log of filtered variants>

DEBUG:mitty.lib.vcfio:Processed 3481 variants
DEBUG:mitty.lib.vcfio:Sample had 3481 variants
DEBUG:mitty.lib.vcfio:Discarded 49 variants
DEBUG:mitty.lib.vcfio:Took 0.1399860382080078 s

$ bgzip final-filtered.vcf
$ tabix final-fltered.vcf.gz
$ mitty -v4 generate-reads ~/Data/human_g1k_v37_decoy.fasta final-filtered.vcf.gz Ust_Ishim ~/Code/mitty3/examples/reads/human-male.bed 1kg-pcr-free.pkl 0.1 7 r1.fq r.lq --fastq2 r2.fq 

<read generation log>

DEBUG:mitty.simulation.readgenerate:All workers are done
DEBUG:mitty.simulation.readgenerate:Stopping writer
DEBUG:mitty.simulation.sequencing.writefastq:Writer finished: 571684 templates in 104.63s (5463.90 t/s)
yassineS commented 7 years ago

Fixed the issue here. Thanks!