Open abeconnelly opened 8 years ago
It looks like this is a broader issue, the handling of zero-width positions is generally not handled well in the current gVCF translation.
The complete genomics format represents some types of variations with zero-width reference length, but VCF needs a width of at least one for reference position. For insertion variants the solution was to back up one position and use that base as reference, and prepend it to the variation. That was fine for VCF, but the addition of reference and no-call lines in gVCF means more needs to be done. (e.g. for an insertion the preceding reference line should also be edited to shift the end backwards.)
Here is a snippet of a CGI-Var file:
that, after running
cgivar2gvcf
produces:As you can see, there are two lines beginning at different start points (
68551
and86641
) but ending at the same endpoint (68640
). I'm not sure if this is actually an error in the CGI-Var file as the problem looks to have stemmed from the 0-length 'no-call' line in the originating CGI-Var file.I've attached a small test CGI-Var file will produce the above gVCF when run against cgivar2gvcf. indel_nstar.cgivar.txt