nmdp-bioinformatics / gfe-db

Graph database representing IPD-IMGT/HLA sequence data as GFE
https://gfe-db.readthedocs.io
GNU General Public License v3.0
9 stars 15 forks source link

odd empty GFEs #78

Open mmaiers-nmdp opened 1 year ago

mmaiers-nmdp commented 1 year ago
match(g:GFE)-[]-(i:IPD_Allele) where g.name = "HLA-Aw0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0" return count (g)

This returns 110 cases

mmaiers-nmdp commented 1 year ago

Could look at which sequences are leading to these sequences.

mmaiers-nmdp commented 1 year ago

This query match(g:GFE)-[]-(i:IPD_Allele) where g.name = "HLA-Aw0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0-0" return distinct i.name order by i.name

returns 55 records with distinct, 110 records without

╒════════════════╕
│"i.name"        │
╞════════════════╡
│"HLA-A*01:01:54"│
├────────────────┤
│"HLA-A*01:01:55"│
├────────────────┤
│"HLA-A*01:01:56"│
├────────────────┤
│"HLA-A*01:127"  │
├────────────────┤
│"HLA-A*01:128"  │
├────────────────┤
│"HLA-A*01:129"  │
├────────────────┤
│"HLA-A*01:130"  │
├────────────────┤
│"HLA-A*02:06:14"│
├────────────────┤
│"HLA-A*02:17:03"│
├────────────────┤
│"HLA-A*02:393"  │
├────────────────┤
│"HLA-A*02:404"  │
├────────────────┤
│"HLA-A*02:405"  │
├────────────────┤
│"HLA-A*02:406"  │
├────────────────┤
│"HLA-A*02:407"  │
├────────────────┤
│"HLA-A*02:408"  │
├────────────────┤
│"HLA-A*02:409"  │
├────────────────┤
│"HLA-A*02:410"  │
├────────────────┤
│"HLA-A*02:411"  │
├────────────────┤
│"HLA-A*02:412"  │
├────────────────┤
│"HLA-A*02:413"  │
├────────────────┤
│"HLA-A*02:414"  │
├────────────────┤
│"HLA-A*02:416"  │
├────────────────┤
│"HLA-A*02:417"  │
├────────────────┤
│"HLA-A*03:01:39"│
├────────────────┤
│"HLA-A*03:01:40"│
├────────────────┤
│"HLA-A*03:163"  │
├────────────────┤
│"HLA-A*03:164"  │
├────────────────┤
│"HLA-A*03:165"  │
├────────────────┤
│"HLA-A*03:166"  │
├────────────────┤
│"HLA-A*11:01:46"│
├────────────────┤
│"HLA-A*11:01:47"│
├────────────────┤
│"HLA-A*11:134"  │
├────────────────┤
│"HLA-A*11:141"  │
├────────────────┤
│"HLA-A*11:142"  │
├────────────────┤
│"HLA-A*23:57"   │
├────────────────┤
│"HLA-A*23:58"   │
├────────────────┤
│"HLA-A*24:231"  │
├────────────────┤
│"HLA-A*24:232N" │
├────────────────┤
│"HLA-A*26:85"   │
├────────────────┤
│"HLA-A*29:51"   │
├────────────────┤
│"HLA-A*29:52"   │
├────────────────┤
│"HLA-A*29:53"   │
├────────────────┤
│"HLA-A*30:72"   │
├────────────────┤
│"HLA-A*30:73N"  │
├────────────────┤
│"HLA-A*30:74"   │
├────────────────┤
│"HLA-A*31:01:18"│
├────────────────┤
│"HLA-A*31:72"   │
├────────────────┤
│"HLA-A*31:73"   │
├────────────────┤
│"HLA-A*31:74"   │
├────────────────┤
│"HLA-A*33:03:14"│
├────────────────┤
│"HLA-A*33:68"   │
├────────────────┤
│"HLA-A*34:11"   │
├────────────────┤
│"HLA-A*68:01:19"│
├────────────────┤
│"HLA-A*68:100"  │
├────────────────┤
│"HLA-A*80:03"   │
└────────────────┘

Max column width:
mmaiers-nmdp commented 1 year ago

query to find any repeated GFE names:

MATCH (g:GFE)
WITH g, count(g.name) AS c
WHERE c > 1
RETURN g
mmaiers-nmdp commented 1 year ago

this is resolved in the dev version