molgenis / systemsgenetics

Generic Java genotype reader / writer, QTL mapping software, Strand alignment tool
https://github.com/molgenis/systemsgenetics/wiki
GNU General Public License v3.0
171 stars 100 forks source link

[Genotype Harmonizer] Bug Report: Confusing Problems on SNPs excluding #667

Open jacklin9703 opened 9 months ago

jacklin9703 commented 9 months ago

Hi! I'm applying Genotype Harmonizer to several PLINK BED datasets. It's an excellent tool, despite I feel confused with outputs derived from GH. It seems that some SNPs are valid but excluded by GH. Here is my command: java -Xmx4g -jar GenotypeHarmonizer.jar --input mydata.b37 --inputType PLINK_BED --ref ALL.chr22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz --refType VCF --chrFilter 22 --update-reference-allele --debug --outputType PLINK_BED --output mydata.b37.chr22.GH

My input PLINK *.bim file looks like this: (According to PLINK2.0 documentation, column 5 refers to the ALT allele and column 6 REF allele)

22  rs5994159   0   16848573    T   C
22  rs9606483   0   16852708    C   A
22  rs10048902  0   16854058    G   C

Here is the log file along with my GH output:

chr pos id      alleles     action      message
22  16848573    rs5994159   T\C Excluded    Found variant with same ID but alleles are not comparable
22  16852708    rs9606483   C\A Excluded    Found variant with same ID but alleles are not comparable
22  16854058    rs10048902  G\C Excluded    Found variant with same ID but alleles are not comparable

GH reported these trouble SNPs as "Found variant with the same ID but alleles are not comparable". However, I've checked out the excluded SNPs above within the 1000G reference VCF file, and made sure they're indeed aligned with the reference panel:

#CHROM  POS ID  REF ALT QUAL    FILTER
22  16848573    rs5994159   C   G,T 100 PASS
22  16852708    rs9606483   A   C,T 100 PASS
22  16854058    rs10048902  C   G,T 100 PASS

It seems that GH didn't recognize the REF/ALT alleles correctly from PLINK bim files. In practice, I wish to reserve these SNPs for downstream analysis. Any help is appreciated.

Best Regards, Jack Lin