Closed kevinschaper closed 1 year ago
This is very confusing because 4 out of 5 of the "POLYGENIC" examples were associated with only one gene. That being said, using the predicate "biolink:gene_associated_with_condition" is broad enough. It means there is some association between the gene and the disease. Therefore, I think we can use it in all cases.
We could probably get more specific for the "MENDELIAN" one (using RO:0003303, causes condition). But I don't see an existing biolink predicate that represents it (it doesn't mean it doesn't exist)
I hope that helps!
Oof, I just tracked backwards to figure out where the risk_affected_by predicate came from. It was commented out for OMIM {
disorder labels along with a contributes to
relation lookup from the translation table. When I updated the OMIM ingest to set the RO term based on the spreadsheet, I kept the predicate. Then the relation field went away in favor of only using predicates, and we only had that predicate left. I'm very glad we're dealing with this!
The predicate mapping ended up being:
biolink:causes
biolink:contributes_to
biolink:gene_associated_with_condition
This ingest is included and looking good in the brand new 2023-05-03
release.
The list of dangling edges where the gene didn't connect is pretty small:
original_subject | subject | predicate | object | original_object |
---|---|---|---|---|
NCBIGene:111365204 | biolink:causes | MONDO:0007630 | OMIM:136550 | |
NCBIGene:111365204 | biolink:causes | MONDO:0010932 | OMIM:600790 | |
NCBIGene:105259599 | biolink:causes | MONDO:0020796 | OMIM:180860 | |
NCBIGene:109580095 | biolink:causes | MONDO:0013517 | OMIM:613985 | |
NCBIGene:105259599 | biolink:causes | MONDO:0007534 | OMIM:130650 | |
NCBIGene:10108 | biolink:causes | MONDO:0008300 | OMIM:176270 | |
NCBIGene:7467 | biolink:causes | MONDO:0008684 | OMIM:194190 | |
NCBIGene:105259599 | biolink:causes | MONDO:0008680 | OMIM:194071 |
A bit larger for diseases that we don't have an entity for
original_subject | subject | predicate | object | original_object |
---|---|---|---|---|
NCBIGene:3662 | HGNC:6119 | biolink:causes | OMIM:611724 | |
NCBIGene:434 | HGNC:745 | biolink:causes | OMIM:611742 | |
NCBIGene:7471 | HGNC:12774 | biolink:contributes_to | OMIM:615221 | |
NCBIGene:977 | HGNC:1630 | biolink:causes | OMIM:179620 | |
NCBIGene:4254 | HGNC:6343 | biolink:causes | OMIM:611664 | |
NCBIGene:5358 | HGNC:9091 | biolink:causes | OMIM:300910 | |
NCBIGene:124872 | HGNC:24136 | biolink:causes | OMIM:615018 | |
NCBIGene:3162 | HGNC:5013 | biolink:contributes_to | OMIM:606963 | |
NCBIGene:4312 | HGNC:7155 | biolink:causes | OMIM:606963 | |
NCBIGene:8706 | HGNC:918 | biolink:causes | OMIM:615021 | |
NCBIGene:6906 | HGNC:11583 | biolink:causes | OMIM:300932 | |
NCBIGene:3990 | HGNC:6619 | biolink:causes | OMIM:612797 | |
NCBIGene:960 | HGNC:1681 | biolink:causes | OMIM:609027 | |
NCBIGene:4018 | HGNC:6667 | biolink:contributes_to | OMIM:618807 | |
NCBIGene:4157 | HGNC:6929 | biolink:causes | OMIM:613098 | |
NCBIGene:9780 | HGNC:28993 | biolink:causes | OMIM:620207 | |
NCBIGene:10551 | HGNC:328 | biolink:causes | MONDO:0859370 | OMIM:620233 |
NCBIGene:84466 | HGNC:29634 | biolink:causes | MONDO:0859515 | OMIM:620249 |
NCBIGene:779 | HGNC:1397 | biolink:causes | MONDO:0859514 | OMIM:620246 |
NCBIGene:3848 | HGNC:6412 | biolink:causes | MONDO:0859574 | OMIM:620148 |
NCBIGene:153 | HGNC:285 | biolink:causes | OMIM:607276 | |
NCBIGene:7125 | HGNC:11944 | biolink:causes | MONDO:0859335 | OMIM:620161 |
NCBIGene:9570 | HGNC:4431 | biolink:causes | MONDO:0859336 | OMIM:620166 |
NCBIGene:55719 | HGNC:17814 | biolink:causes | MONDO:0859575 | OMIM:620184 |
NCBIGene:23137 | HGNC:20465 | biolink:causes | MONDO:0859576 | OMIM:620185 |
NCBIGene:2255 | HGNC:3666 | biolink:causes | MONDO:0859578 | OMIM:620193 |
NCBIGene:2261 | HGNC:3690 | biolink:causes | MONDO:0859577 | OMIM:620192 |
NCBIGene:23286 | HGNC:29435 | biolink:causes | OMIM:615602 | |
NCBIGene:84946 | HGNC:21173 | biolink:causes | MONDO:0859355 | OMIM:620199 |
NCBIGene:55969 | HGNC:15870 | biolink:causes | MONDO:0859567 | OMIM:616994 |
NCBIGene:720 | HGNC:1323 | biolink:causes | OMIM:614374 | |
NCBIGene:949 | HGNC:1664 | biolink:causes | OMIM:610762 | |
NCBIGene:345275 | HGNC:18685 | biolink:contributes_to | OMIM:620116 | |
NCBIGene:90523 | HGNC:21355 | biolink:causes | MONDO:0859322 | OMIM:620138 |
NCBIGene:8854 | HGNC:15472 | biolink:causes | MONDO:0859571 | OMIM:620025 |
NCBIGene:118987 | HGNC:26974 | biolink:causes | MONDO:0859281 | OMIM:620021 |
NCBIGene:55107 | HGNC:21625 | biolink:causes | MONDO:0859289 | OMIM:620045 |
NCBIGene:171019 | HGNC:17111 | biolink:causes | MONDO:0859572 | OMIM:620067 |
NCBIGene:3911 | HGNC:6485 | biolink:causes | MONDO:0859573 | OMIM:620076 |
NCBIGene:506 | HGNC:830 | biolink:causes | MONDO:0859302 | OMIM:620085 |
NCBIGene:865 | HGNC:1539 | biolink:causes | MONDO:0859307 | OMIM:620099 |
NCBIGene:3077 | HGNC:4886 | biolink:causes | OMIM:614193 | |
NCBIGene:2556 | HGNC:4077 | biolink:causes | MONDO:0859564 | OMIM:301091 |
NCBIGene:2157 | HGNC:3546 | biolink:causes | MONDO:0859082 | OMIM:301071 |
NCBIGene:2532 | HGNC:4035 | biolink:causes | OMIM:611862 | |
NCBIGene:3047 | HGNC:4831 | biolink:causes | OMIM:141749 | |
NCBIGene:3048 | HGNC:4832 | biolink:causes | OMIM:141749 | |
NCBIGene:3043 | HGNC:4827 | biolink:causes | OMIM:141749 | |
NCBIGene:50833 | HGNC:14921 | biolink:causes | OMIM:617956 | |
NCBIGene:29881 | HGNC:7898 | biolink:causes | OMIM:617966 | |
NCBIGene:6006 | HGNC:10008 | biolink:causes | OMIM:617970 | |
NCBIGene:55366 | HGNC:13299 | biolink:contributes_to | OMIM:615311 | |
NCBIGene:3615 | HGNC:6053 | biolink:causes | OMIM:617995 | |
NCBIGene:55366 | HGNC:13299 | biolink:causes | MONDO:0859205 | OMIM:619613 |
NCBIGene:4087 | HGNC:6768 | biolink:causes | MONDO:0859213 | OMIM:619657 |
NCBIGene:8482 | HGNC:10741 | biolink:causes | OMIM:614745 | |
NCBIGene:3570 | HGNC:6019 | biolink:causes | OMIM:614752 | |
NCBIGene:2646 | HGNC:4196 | biolink:causes | OMIM:613463 | |
NCBIGene:57498 | HGNC:29508 | biolink:causes | MONDO:0859184 | OMIM:619501 |
NCBIGene:5290 | HGNC:8975 | biolink:causes | MONDO:0859192 | OMIM:619538 |
NCBIGene:3570 | HGNC:6019 | biolink:causes | OMIM:614689 | |
NCBIGene:285498 | HGNC:27729 | biolink:causes | OMIM:612042 | |
NCBIGene:338557 | HGNC:19061 | biolink:contributes_to | OMIM:607514 | |
NCBIGene:1136 | HGNC:1957 | biolink:contributes_to | OMIM:612052 | |
NCBIGene:1138 | HGNC:1959 | biolink:contributes_to | OMIM:612052 | |
NCBIGene:3773 | HGNC:6262 | biolink:causes | MONDO:0859167 | OMIM:619406 |
NCBIGene:6809 | HGNC:11438 | biolink:causes | MONDO:0859170 | OMIM:619446 |
NCBIGene:6007 | HGNC:10009 | biolink:contributes_to | MONDO:0859172 | OMIM:619462 |
NCBIGene:51474 | HGNC:24636 | biolink:causes | OMIM:618079 | |
NCBIGene:1378 | HGNC:2334 | biolink:causes | OMIM:607486 | |
NCBIGene:360 | HGNC:636 | biolink:causes | OMIM:607457 | |
NCBIGene:7351 | HGNC:12518 | biolink:contributes_to | OMIM:607447 | |
NCBIGene:51129 | HGNC:16039 | biolink:causes | OMIM:615881 | |
NCBIGene:1317 | HGNC:11016 | biolink:causes | OMIM:620306 | |
NCBIGene:3032 | HGNC:4803 | biolink:causes | OMIM:620300 | |
NCBIGene:84699 | HGNC:18855 | biolink:causes | MONDO:0859149 | OMIM:619324 |
NCBIGene:1289 | HGNC:2209 | biolink:causes | MONDO:0859151 | OMIM:619329 |
NCBIGene:26175 | HGNC:20218 | biolink:causes | MONDO:0859156 | OMIM:619345 |
NCBIGene:1558 | HGNC:2622 | biolink:contributes_to | OMIM:618018 | |
NCBIGene:1066 | HGNC:1863 | biolink:causes | OMIM:618057 | |
NCBIGene:347734 | HGNC:16872 | biolink:causes | MONDO:0859518 | OMIM:620269 |
NCBIGene:58 | HGNC:129 | biolink:causes | MONDO:0859517 | OMIM:620265 |
NCBIGene:58 | HGNC:129 | biolink:causes | MONDO:0859523 | OMIM:620278 |
NCBIGene:3604 | HGNC:11924 | biolink:causes | MONDO:0859526 | OMIM:620282 |
NCBIGene:3055 | HGNC:4840 | biolink:causes | OMIM:620296 | |
NCBIGene:23129 | HGNC:9107 | biolink:causes | MONDO:0859532 | OMIM:620294 |
NCBIGene:2805 | HGNC:4432 | biolink:causes | OMIM:614419 | |
NCBIGene:9429 | HGNC:74 | biolink:causes | OMIM:614490 | |
NCBIGene:10913 | HGNC:2895 | biolink:causes | OMIM:612630 | |
NCBIGene:2524 | HGNC:4013 | biolink:contributes_to | OMIM:612542 | |
NCBIGene:9370 | HGNC:13633 | biolink:causes | OMIM:612556 | |
NCBIGene:7367 | HGNC:12547 | biolink:contributes_to | OMIM:612560 | |
NCBIGene:2492 | HGNC:3969 | biolink:causes | OMIM:276400 | |
NCBIGene:9200 | HGNC:9639 | biolink:causes | MONDO:0859264 | OMIM:619967 |
NCBIGene:7123 | HGNC:11891 | biolink:causes | MONDO:0859568 | OMIM:619977 |
NCBIGene:497661 | HGNC:31690 | biolink:causes | MONDO:0859271 | OMIM:619985 |
NCBIGene:56992 | HGNC:17273 | biolink:causes | MONDO:0859570 | OMIM:619981 |
NCBIGene:54914 | HGNC:23377 | biolink:causes | MONDO:0859273 | OMIM:619991 |
NCBIGene:420 | HGNC:726 | biolink:causes | OMIM:616060 | |
NCBIGene:28 | HGNC:79 | biolink:causes | OMIM:616093 | |
NCBIGene:7299 | HGNC:12442 | biolink:contributes_to | OMIM:601800 | |
NCBIGene:79068 | HGNC:24678 | biolink:contributes_to | OMIM:612460 | |
NCBIGene:1604 | HGNC:2665 | biolink:causes | OMIM:613793 | |
NCBIGene:7289 | HGNC:12425 | biolink:causes | MONDO:0859254 | OMIM:619902 |
NCBIGene:335 | HGNC:600 | biolink:causes | MONDO:0859238 | OMIM:619836 |
NCBIGene:8789 | HGNC:3607 | biolink:causes | MONDO:0859246 | OMIM:619864 |
NCBIGene:10935 | HGNC:9354 | biolink:causes | MONDO:0859248 | OMIM:619871 |
NCBIGene:5122 | HGNC:8743 | biolink:contributes_to | OMIM:612362 | |
NCBIGene:3655 | HGNC:6142 | biolink:causes | MONDO:0859233 | OMIM:619817 |
NCBIGene:54872 | HGNC:25985 | biolink:causes | OMIM:619812 | |
NCBIGene:79639 | HGNC:26186 | biolink:causes | MONDO:0859226 | OMIM:619727 |
NCBIGene:4160 | HGNC:6932 | biolink:causes | OMIM:618406 | |
NCBIGene:51341 | HGNC:18078 | biolink:causes | MONDO:0859231 | OMIM:619769 |
NCBIGene:6521 | HGNC:11027 | biolink:causes | OMIM:601550 | |
NCBIGene:9429 | HGNC:74 | biolink:causes | OMIM:138900 | |
NCBIGene:6521 | HGNC:11027 | biolink:causes | OMIM:601551 | |
NCBIGene:59341 | HGNC:18083 | biolink:causes | OMIM:613508 | |
NCBIGene:6774 | HGNC:11364 | biolink:causes | OMIM:147060 | |
NCBIGene:10661 | HGNC:6345 | biolink:causes | OMIM:613566 | |
NCBIGene:6272 | HGNC:11186 | biolink:causes | OMIM:613589 | |
NCBIGene:219931 | HGNC:20820 | biolink:causes | OMIM:612267 | |
NCBIGene:100128908 | HGNC:53647 | biolink:causes | MONDO:0859222 | OMIM:619702 |
NCBIGene:51780 | HGNC:1337 | biolink:gene_associated_with_condition | MONDO:0858999 | Orphanet:633004 |
NCBIGene:2316 | HGNC:3754 | biolink:gene_associated_with_condition | Orphanet:323 | |
NCBIGene:65109 | HGNC:20439 | biolink:gene_associated_with_condition | Orphanet:323 | |
NCBIGene:8573 | HGNC:1497 | biolink:gene_associated_with_condition | Orphanet:323 | |
NCBIGene:254065 | HGNC:17342 | biolink:gene_associated_with_condition | Orphanet:323 | |
NCBIGene:6558 | HGNC:10911 | biolink:gene_associated_with_condition | Orphanet:633024 | |
NCBIGene:6558 | HGNC:10911 | biolink:gene_associated_with_condition | Orphanet:633021 | |
NCBIGene:1363 | HGNC:2303 | biolink:gene_associated_with_condition | MONDO:0859001 | Orphanet:633028 |
NCBIGene:1315 | HGNC:2231 | biolink:gene_associated_with_condition | MONDO:0859002 | Orphanet:633035 |
NCBIGene:3075 | HGNC:4883 | biolink:gene_associated_with_condition | Orphanet:244275 | |
NCBIGene:3426 | HGNC:5394 | biolink:gene_associated_with_condition | Orphanet:244275 | |
NCBIGene:476 | HGNC:799 | biolink:gene_associated_with_condition | Orphanet:564178 | |
NCBIGene:6906 | HGNC:11583 | biolink:gene_associated_with_condition | Orphanet:209893 | |
NCBIGene:1589 | HGNC:2600 | biolink:gene_associated_with_condition | Orphanet:95698 | |
NCBIGene:410 | HGNC:713 | biolink:gene_associated_with_condition | Orphanet:751 | |
NCBIGene:136371 | HGNC:17185 | biolink:gene_associated_with_condition | Orphanet:353225 | |
NCBIGene:1545 | HGNC:2597 | biolink:gene_associated_with_condition | Orphanet:353225 | |
NCBIGene:134430 | HGNC:30696 | biolink:gene_associated_with_condition | Orphanet:353225 | |
NCBIGene:4909 | HGNC:8024 | biolink:gene_associated_with_condition | Orphanet:353225 | |
NCBIGene:10133 | HGNC:17142 | biolink:gene_associated_with_condition | Orphanet:353225 | |
NCBIGene:51271 | HGNC:12461 | biolink:gene_associated_with_condition | MONDO:0858986 | Orphanet:631068 |
NCBIGene:4159 | HGNC:6931 | biolink:gene_associated_with_condition | Orphanet:217031 | |
NCBIGene:57156 | HGNC:23787 | biolink:gene_associated_with_condition | MONDO:0858992 | Orphanet:631088 |
NCBIGene:7920 | HGNC:13921 | biolink:gene_associated_with_condition | MONDO:0858991 | Orphanet:631085 |
NCBIGene:81790 | HGNC:25358 | biolink:gene_associated_with_condition | MONDO:0858990 | Orphanet:631082 |
NCBIGene:84842 | HGNC:28242 | biolink:gene_associated_with_condition | MONDO:0858988 | Orphanet:631076 |
NCBIGene:5833 | HGNC:8756 | biolink:gene_associated_with_condition | MONDO:0858987 | Orphanet:631073 |
NCBIGene:5297 | HGNC:8983 | biolink:gene_associated_with_condition | MONDO:0858989 | Orphanet:631079 |
NCBIGene:27022 | HGNC:3804 | biolink:gene_associated_with_condition | Orphanet:3435 | |
NCBIGene:22861 | HGNC:14374 | biolink:gene_associated_with_condition | Orphanet:3435 | |
NCBIGene:3762 | HGNC:6266 | biolink:gene_associated_with_condition | Orphanet:85142 | |
NCBIGene:1499 | HGNC:2514 | biolink:gene_associated_with_condition | Orphanet:85142 | |
NCBIGene:776 | HGNC:1391 | biolink:gene_associated_with_condition | Orphanet:85142 | |
NCBIGene:492 | HGNC:816 | biolink:gene_associated_with_condition | Orphanet:85142 | |
NCBIGene:476 | HGNC:799 | biolink:gene_associated_with_condition | Orphanet:85142 | |
NCBIGene:4547 | HGNC:7467 | biolink:gene_associated_with_condition | Orphanet:426 | |
NCBIGene:255738 | HGNC:20001 | biolink:gene_associated_with_condition | Orphanet:426 | |
NCBIGene:29881 | HGNC:7898 | biolink:gene_associated_with_condition | Orphanet:426 | |
NCBIGene:338 | HGNC:603 | biolink:gene_associated_with_condition | Orphanet:426 | |
NCBIGene:27329 | HGNC:491 | biolink:gene_associated_with_condition | Orphanet:426 | |
NCBIGene:345 | HGNC:610 | biolink:gene_associated_with_condition | Orphanet:426 | |
NCBIGene:29116 | HGNC:21155 | biolink:gene_associated_with_condition | Orphanet:426 | |
NCBIGene:3949 | HGNC:6547 | biolink:gene_associated_with_condition | Orphanet:406 | |
NCBIGene:255738 | HGNC:20001 | biolink:gene_associated_with_condition | Orphanet:406 | |
NCBIGene:338 | HGNC:603 | biolink:gene_associated_with_condition | Orphanet:406 | |
NCBIGene:26228 | HGNC:24133 | biolink:gene_associated_with_condition | Orphanet:406 | |
NCBIGene:348 | HGNC:613 | biolink:gene_associated_with_condition | Orphanet:406 | |
NCBIGene:3988 | HGNC:6617 | biolink:gene_associated_with_condition | Orphanet:406 |
It looks like the mondo terms I have here that I have mapping for but don't actually have the terms are going to show up in the next release. It's exciting that this is tight enough that we're really seeing that the only problems are down to how we synchronize within a month.
This is done
We have HPO's g2d file sitting in our data-cache now (thanks @iimpulse!) and we can swap it in for our OMIM g2d ingest
The file looks like (taken from a few places to get examples of all 3 association types)
ncbi_gene_id
No change necessary, just pass through as-is (and it will be mapped to HGNC later)
gene_symbol
Not used
association_type
values: 6573 MENDELIAN 621 POLYGENIC 8158 UNKNOWN
We should check with @sabrinatoro on predicate mappings.
my partial guess: MENDELIAN:
biolink:affects_risk_for
UNKNOWN:biolink:gene_associated_with_condition
POLYGENIC: ???disease_id
prefixes: 7194 OMIM 8158 ORPHA
Need to replace("ORPHA:", "Orphanet:") to match the curie in MONDO sssom
source
The source column values are
We should map these to infores as the primary_knowledge_source (maybe
infores:omim
for the medgen, then medgen as an aggregator? - plusinfores:orphanet
)Also add aggregating_knowledge_source that includes
infores:monarchinitiative
,infores:hpo-annotations
, and probablyinfores:medgen
download & file path
The file is available in the
data/hpoa/genes_to_disease.txt
in gs://monarch-ingest-data-cache already, it won't be available from the HPO site until the next release, but we should still add a commented out entry in download.yaml and make a second issue to enable the download later.