Open kshefchek opened 3 years ago
According to the docs, if MERGED == 1, we should be using the SNP_ID_CURRENT column
Looks like we already have some support for this: https://github.com/monarch-initiative/dipper/blob/254242e2/dipper/sources/GWASCatalog.py#L450
From the gwas catalog docs:
SNPS*: Strongest SNP; if a haplotype it may include more than one rs number (multiple SNPs comprising the haplotype)
MERGED*: denotes whether the SNP has been merged into a subsequent rs record (0 = no; 1 = yes;)
SNP_ID_CURRENT*: current rs number (will differ from strongest SNP when merged = 1)
For example the row:
Per https://www.ncbi.nlm.nih.gov/snp/rs35794310 - rs35794310 was merged with rs11415565 and https://www.ncbi.nlm.nih.gov/snp/rs147955325 - rs147955325 was merged with rs11415565
We should model this similarly to how we model deprecated identifiers in ontologies, but it's unclear from this row alone which identifier is the current one (is it always the last in the list?)
See https://github.com/monarch-initiative/monarch-ui/issues/383