knausb / vcfR

Tools to work with variant call format files
248 stars 54 forks source link

error while accessing filltered vcf file #184

Closed sekhwal closed 3 years ago

sekhwal commented 3 years ago

Hi, I have a vcf file. It is filtered by tassel and save it as vcf.

while accessing the this vcf file to popgenInfo with the following commands, I am getting the following error. Please help me to figure out the issue.

gi_vcf <- read.vcfR( file = "gi-filter-new.vcf", verbose = T ) gi_genind <- vcfR2genind(gi_vcf)

error::::; Error in extract.gt(x, return.alleles = return.alleles) : ID column contains non-unique names

knausb commented 3 years ago

Hi @sekhwal , the VCF specification section 1.6.1 states that values in the ID column are to be unique identifiers for each variant. Your error is telling you that you have non-unique values in your ID column and this means that your file is not following the VCF specification. You could try something like the following.

sort(table(my_vcf@fix$ID))

To find the offending records. If you are confident that your data is well formed you could add a post-fix (i.e., _1, ..., _n) to your ID column. A better solution would be to review your analytical pipeline to determine where your data deviated from the VCF specification.

Thanks! Brian