shane-c-lawson / breseq

Automatically exported from code.google.com/p/breseq
Other
0 stars 0 forks source link

gd2gvf incorrect Ref nuclotide and strand #73

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1.breseq complete analysis
2.filter on non synonymous snp (grep)
3.gdtools gd2gvf

What is the expected output? What do you see instead?

The reference nucleotide in the GVF file is always the same (seems to be equal 
to the first ALT in the .gd file) and the strand is always "+"

#GDFILE
SNP 13  11887   MyRefSeq    7382    T   aa_new_seq=F    aa_position=139 aa_ref_seq=L    codon_new
_seq=TTT    codon_number=139    codon_position=3    codon_ref_seq=TTG   gene_list=A0005 gen
e_name=A0005    gene_position=417   gene_product=protease   gene_strand=>   html_gene_nam
e=<i>A0005</i>&nbsp;&rarr;  locus_tag=A0005 snp_type=nonsynonymous  transl_table=1
1
SNP 44  11995   MyRefSeq    21469   T   aa_new_seq=T    aa_position=52  aa_ref_seq=A    codon_new
_seq=ACC    codon_number=52 codon_position=1    codon_ref_seq=GCC   gene_list=A0020gene_
name=A0020  gene_position=154   gene_product=hypothetical 
protein gene_strand=<   html_gene_name=<i>A0020</i>&nbsp;&larr; locus_tag=A0020 sn
p_type=nonsynonymous    transl_table=11
SNP 72  12571   MyRefSeq    38456   T   aa_new_seq=F    aa_position=247 aa_ref_seq=L    codon_ne
w_seq=TTC   codon_number=247    codon_position=1    codon_ref_seq=CTC   gene_list=A0032 ge
ne_name=A0032   gene_position=739   gene_product=lipoprotein    gene_strand=>   html_gene
_name=<i>A0032</i>&nbsp;&rarr;  locus_tag=A0032 snp_type=nonsynonymous  transl_tab
le=11

#PRODUCED-GVFFILE
MyRefSeq    breseq  SNV 7382    7382    .   +   .   ID=MyRefSeq:breseq:SNV:7382;Variant_seq=T;Re
ference_seq=T;Variant_effect=non_synonymous_codon   
MyRefSeq    breseq  SNV 21469   21469   .   +   .   ID=MyRefSeq:breseq:SNV:21469;Variant_seq=T
;Reference_seq=T;Variant_effect=non_synonymous_codon    
MyRefSeq    breseq  SNV 38456   38456   .   +   .   ID=MyRefSeq:breseq:SNV:38456;Variant_seq=T
;Reference_seq=T;Variant_effect=non_synonymous_codon

#EXPECTED-GVFFILE
MyRefSeq    breseq  SNV 7382    7382    .   +   .   ID=MyRefSeq:breseq:SNV:7382;Variant_seq=T;Re
ference_seq=G;Variant_effect=non_synonymous_codon   
MyRefSeq    breseq  SNV 21469   21469   .   -   .   ID=MyRefSeq:breseq:SNV:21469;Variant_seq=T
;Reference_seq=C;Variant_effect=non_synonymous_codon    
MyRefSeq    breseq  SNV 38456   38456   .   +   .   ID=MyRefSeq:breseq:SNV:38456;Variant_seq=T
;Reference_seq=C;Variant_effect=non_synonymous_codon

What version of the product are you using? On what operating system?
breseq 0.23 on Debian wheezy/sid (Linux 3.6.3 x86_64 GNU/Linux)

Please provide any additional information below.

Original issue reported on code.google.com by sebastie...@gmail.com on 16 Oct 2013 at 8:30

GoogleCodeExporter commented 8 years ago
Thanks for finding this error and the detailed report. I had stopped supporting 
gd2gvf in v0.24, in favor of gd2vcf, but will add it back if you are using it.

I'm working off the spec here: 
http://www.sequenceontology.org/resources/gvf.html

(1) I will definitely fix the Reference_seq issue, which is clearly a bug!!

(2) I am not completely familiar with GVF format, but I believe the strand for 
an SNV should always be +, because it refers to the strand of the SNV base 
change, not the gene that it's in. 

(3) It seems the Variant_effect field should be providing more information that 
it currently is. Specifically, it should be 4 tab delimited columns, the last 
of which are the sequence_feature and feature_ID so that you could look up the 
relevant feature (to get it's strand, for example) in the input reference file.

Will those changes work for you?

Original comment by jeffrey....@gmail.com on 16 Oct 2013 at 2:10

GoogleCodeExporter commented 8 years ago
Thanks for your rapid answer !

1) VCF is fine for me (I use this format but it was not available in breseq 
0.23)

2) I'm not familiar at all with GVF format neither ... but  I had a look to the 
specification; there is no constraint on this field (exept the syntax). So in 
my use of GVF file in Jbrowse, I prefered to use the strand on the affected 
gene.

3) You're right; 

I wrote my own gd2gvf converter, so this issue is not blocking me.
And happy to see that gd2vcf will be soon available !

Sebastien 

Original comment by sebastie...@gmail.com on 16 Oct 2013 at 2:28

GoogleCodeExporter commented 8 years ago

Original comment by jeffrey....@gmail.com on 8 Aug 2014 at 9:36