knausb / vcfR

Tools to work with variant call format files
240 stars 54 forks source link

change genotype(change gap to NA) #212

Closed snackens closed 1 year ago

snackens commented 1 year ago

Hello, thank you for the convenient tool. I have a question.

In my vcf file, there are '*', which mean gap.

For example, REF ATL
A T,*

In genotype fields, 0=A, 1=T, 2=*(gap).

I'd like to treat gap site to missing site. So, I'd like to replace 2 to ".".

How could I do this?

Best regrds,

knausb commented 1 year ago

Hi @snackens , I'm afraid I do not understand what you're trying to accomplish here. When I find myself in this situation I tend to start with the VCF specification (http://samtools.github.io/hts-specs/), we appear to be at v4.4 today, so that's what I'll cite.

I do not believe the VCF specification has a concept for 'gap'. I queried the document and found no mention of 'gap'. In section 1.1 there is an example that includes a microsatellite. Here I think we can interpret this as a length polymorphism/insertion/deletion/gap, etc. The complete sequence for the reference and all alternate alleles are presented. This means that there is no need for a 'symbol' to represent a gap. Because I feel that 'gap' doesn't exist in the specification I am concerned that your question does not make sense.

In section 1.6.1 there is mention of using an asterisk as a 'symbolic allele' which I have no experience with. But it does not sound like a gap to me.

Could you please review this information and clarify what you're attempting to accomplish? Thanks! Brian

snackens commented 1 year ago

Thank you for your reply. I have solved this question. I found out that what I tried to do is something strange. Thank you for giving advice a lot and this excellent tool.

Best regards,