Open cycarthur opened 7 months ago
Hi Arthur, Looks like there are multi-base genotypes in your vcf file. This could be caused by indels, or by using a genotyper that calls multi-base variants. If it's indels, these can be removed with a tool like bcftools filter. If it's multi-site variants (i.e. where the reference and the alternate allele both consist of multiple bases), you will need to look at what people do to simplify outputs from the genotyper you have used. Simon
Hi Simon,
I used parseVCF.py to convert my VCF to geno file, but the generated file had more than one nucleotide for some of the locations, which I suspected to cause more errors when I tried to input the file into phyml_sliding_windows.py. Examples as below:
Chr01 97 N/N G/G A|A
Chr01 100 N/N T/T T|T
Chr01 104 N/N TTA/TTA TTA/TTA TTA/TTA Chr01 116 N/N G/G G/G G/G
Do you know if there is anyway to prevent this from happening or filter out those rows? Thank you.
Best, Arthur