monarch-initiative / monarch-ingest

Data ingest application for Monarch Initiative knowledge graph using Koza
https://monarchinitiative.org
15 stars 2 forks source link

"Also known as" on gene pages mixes up genes and diseases (suggesting problem in ingest) #522

Open nlharris opened 1 year ago

nlharris commented 1 year ago

@pnrobinson wrote (in Slack):

On the gene page where there is a field called Also known as, For FBN1, I see this: MASS, OCTD, SGS, Marfan syndrome, asprosin, FBN, MFS1, WMS, fibrillin 1 (Marfan syndrome) -- this mixes up genes and diseases -- something is going wrong with the ingest

[heritability issue moved to https://github.com/monarch-initiative/monarch-app/issues/314]

kevinschaper commented 1 year ago

Right now the ingest is bringing together all of the values from alias_symbol, alias_name, prev_symbol and prev_name:

Here's (with some column reduction and massaging) is what that looks like in hgnc_complete_set.txt:

hgnc_id symbol alias_symbol alias_name prev_symbol prev_name
HGNC:3603 FBN1 MASS, OCTD, SGS Marfan syndrome, asprosin FBN, MFS1, WMS fibrillin 1 (Marfan syndrome)

and on the site:

https://www.genenames.org/data/gene-symbol-report/#!/hgnc_id/HGNC:3603

Screenshot 2023-09-12 at 10 05 14 AM