snpeek / snpeek.github.io

https://snpeek.github.io/
MIT License
2 stars 3 forks source link

Ordering of checks in `flipOrientation` causes failure on 23andme data with "I" genotypes #1

Closed softminus closed 1 year ago

softminus commented 1 year ago

from https://github.com/snpeek/snpeek.github.io/blob/main/src/index.ts#L303-L310

function flipOrientation (genotype: string): string {
  if (genotype.length !== 2) {
    throw new Error('Invalid genotype')
  }
  if (!IsNucleotide(genotype)) {
    console.log(`found odd genotype=${genotype}`)
    return genotype // we don't need to flip II genotypes
  }

Having the isNucleotide check after the length check causes total failure (no results reported, "Invalid genotype" error in the JS console), I believe due to lines that look like this in 23andme data:

rs[redacted]    X   [redacted number]   G
rs[redacted]    X   [redacted number]   I

It's possible that reordering the checks still causes issues since it would still cause an error with the single-letter genotypes, of which there are some, at least in my V5 23andme file.

I was able to get past this by removing the if (genotype.length !== 2) check in the minified JS.

For reference, the header of the 23andme file looks like:

# This data file generated by 23andMe at: Fri May 05 15:27:52 2023
#
# This file contains raw genotype data, including data that is not used in 23andMe reports.
# This data has undergone a general quality review however only a subset of markers have been 
# individually validated for accuracy. As such, this data is suitable only for research, 
# educational, and informational use and not for medical or other use.
# 
# Below is a text version of your data.  Fields are TAB-separated
# Each line corresponds to a single SNP.  For each SNP, we provide its identifier 
# (an rsid or an internal id), its location on the reference human genome, and the 
# genotype call oriented with respect to the plus strand on the human reference sequence.
# We are using reference human assembly build 37 (also known as Annotation Release 104).
# Note that it is possible that data downloaded at different times may be different due to ongoing 
# improvements in our ability to call genotypes. More information about these changes can be found at:
# https://you.23andme.com/p/0abf9c75b2a78406/tools/data/download/
# 
# More information on reference human assembly builds:
# https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.13/
#
# rsid  chromosome  position    genotype

and the file name is genome_[redacted]_[redacted]_v5_Full_20230505152752.txt

d6e commented 1 year ago

Oh interesting, the 23andme data I have doesn't have any single letter genotypes. I put the check there just as a sanity check. Does skipping if a single letter genotype is found sound good?

d6e commented 1 year ago

I believe this is solved now.