Closed dhimmel closed 2 years ago
Strongly agree that xrefs to MESH terms should use unique ID rather than tree node ID. GO has switched to use these. Uberon should too. For CL, please could you file a ticket on the CL tracker.
@dosumis, I filed a ticket using the CL tracker.
+1 thanks @dhimmel, we'll implement this ASAP.
It's worth noting that many tree numbers don't match current MeSH unique ids. Presumably this is due to ontology structure modifications, since the initial xref was added.
279 still to do. I can do these fairly easily... if someone has a sane obo or owl version of mesh it would help (the original with the tree numbers came via an odd route. the RH mesh on bioportal is a bit unusual).
if someone has a sane obo or owl version of mesh it would help
@cmungall, I don't have an obo, but have a networkx
representation of MeSH in python. I've exported it to a gexf file (download). networkx
can export to a variety of additional formats, but not obo. Would any of these help you?
Also, I am willing to do some mappings if you tell me where/how to make the edits. I would like to proceed with my research using the updated UBERON as soon as possible.
As for the analogous issue in CL, there have been no responses to my post. Is CL still maintained?
I may be interested in looking into networkx for another project. Do you know if there are bindings to Neo4J (cc @nlwashington)?
There is nothing stopping me from using any of the formats you provide... but if you wanted to make an obo and/or rdf export that would speed it up and might prove generically useful for a bunch of people.
I responded on the CL list just to prove it's alive, cc @nicolevasilevsky to bring it up next call
I added this to our agenda: https://docs.google.com/document/d/18sjGxfODgaK0MqeDBb4j3jqtX2g4yzy98pq5mpsZqPk/edit
I'm currently working with the EFO ontology v3.22.0, and looks like many of these xrefs to MeSH tree locations rather than identifiers are still present.
They only exist for 1 CL term and 121 UBERON terms, so I am guessing these are still this way in UBERON and are the tree numbers are getting imported from UBERON into EFO.
Would be awesome to get these all converted to IDs. Will update if I find an automated way to do this, but the challenge is that they are MeSH-release specific AFAICT, and have therefore grown more and more stale.
Wow, I just bumped into this issue today and found that @dhimmel noticed it 7 years ago. The issue of MeSH tree numbers - many of which do not point to currently existing tree numbers - being provided as xrefs still exists. These xrefs are not formally differentiated from xrefs that point to MeSH term IDs, i.e., UBERON contains a mixture of xrefs of the form MESH:D020667
and MESH:A08.186.211.730.385.826.701.490
.
If I were to implement a semi-automated way of fixing these xrefs to point to current, existing MeSH term IDs, which version controlled file should I ideally contribute them to?
@cmungall @shawntanzk @matentzn I'll add this to the tech board, but I'm also pinging Chris in case he has time, as he addressed part of this issue previously, and had a plan for the rest if that still applies. (@shawntanzk, not sure where exactly in the board you'd like this to be, so I'll leave it to you after all. Thanks!)
Hi @paolaroncaglia - thanks for the background. @bgyori PR looks good. Have approved but asked Ray to eyeball it too. Presumably Chris' ancient PR will be much harder to merge.
wow 2015 lol, ill add it to the tech board, but @bgyori has fixed a huge part of it already (thanks heaps!). I see that there are still unmapped terms that gilda couldn't do, I guess that @rays22 will look into those too? Anyway Just a heads up, merging that PR will automatically close this ticket, so either do all the fixes there or make sure to reopen after merging :)
btws @paolaroncaglia we are using this now: https://github.com/orgs/obophenotype/projects/11/views/1
@shawntanzk
btws @paolaroncaglia we are using this now: https://github.com/orgs/obophenotype/projects/11/views/1
Exactly, requires a decision on priority level etc that I'd rather leave to you :-) Thanks for confirming though!
@paolaroncaglia oh right, will discuss if we need a triage section :) but I think you can also just put it whatever you think is right, we will move it when we come across it if we want to prioritize/deprioritize it anyway :)
I see that there are still unmapped terms that gilda couldn't do, I guess that @rays22 will look into those too?
I am looking into those.
There are still 75 unmapped terms that gilda could not match.
I did some manual mapping of the remaining 75 MeSH to UBERON terms in the table below. I am going to add the 66 skos:exactMatch
MeSH IDs to UBERON as xrefs.
I intend to leave out the remaining 9 MeSH terms until a future more comprehensive alignment of UBERON and MeSH.
uberon_id | uberon_name | old_mapping | method | new_mapping | new_mapping_name | new_mapping_url | mapping predicate |
---|---|---|---|---|---|---|---|
UBERON:0000107 | cleavage stage | A16.254.270 | manual lookup | D002970 | Cleavage Stage, Ovum | https://bioregistry.io/mesh:D002970 | skos:exactMatch |
UBERON:0000389 | lens cortex | A09.371.509.225 | manual lookup | D007904 | Lens Cortex, Crystalline | https://bioregistry.io/mesh:D007904 | skos:exactMatch |
UBERON:0000390 | lens nucleus | A09.371.509.670 | manual lookup | D007907 | Lens Nucleus, Crystalline | https://bioregistry.io/mesh:D007907 | skos:exactMatch |
UBERON:0000965 | lens of camera-type eye | A09.371.509 | manual lookup | D007908 | Lens, Crystalline | https://bioregistry.io/mesh:D007908 | skos:exactMatch |
UBERON:0001042 | chordate pharynx | A03.867 | manual lookup | D010614 | Pharynx | https://bioregistry.io/mesh:D010614 | skos:exactMatch |
UBERON:0001091 | calcareous tooth | A14.254.860 | manual lookup | D014070 | Tooth | https://bioregistry.io/mesh:D014070 | skos:exactMatch |
UBERON:0001092 | vertebral bone 1 | A02.835.232.834.151.213 | manual lookup | D001270 | Cervical Atlas | https://bioregistry.io/mesh:D001270 | skos:exactMatch |
UBERON:0001098 | incisor tooth | A14.254.860.425 | manual lookup | D007180 | Incisor | https://bioregistry.io/mesh:D007180 | skos:exactMatch |
UBERON:0001137 | dorsum | A01.176 | manual lookup | D001415 | Back | https://bioregistry.io/mesh:D001415 | skos:exactMatch |
UBERON:0001153 | caecum | A03.492.411.495.209 | manual lookup | D002432 | Cecum | https://bioregistry.io/mesh:D002432 | skos:exactMatch |
UBERON:0001160 | fundus of stomach | A03.492.766.419 | manual lookup | D005748 | Gastric Fundus | https://bioregistry.io/mesh:D005748 | skos:exactMatch |
UBERON:0001162 | cardia of stomach | A03.492.766.163 | manual lookup | D002299 | Cardia | https://bioregistry.io/mesh:D002299 | skos:exactMatch |
UBERON:0001199 | mucosa of stomach | A03.492.766.440 | manual lookup | D005753 | Gastric Mucosa | https://bioregistry.io/mesh:D005753 | skos:exactMatch |
UBERON:0001211 | Peyer's patch | A10.549.598 | manual lookup | D010581 | Peyer's Patches | https://bioregistry.io/mesh:D010581 | skos:exactMatch |
UBERON:0001212 | duodenal gland | A03.492.411.620.270.322 | manual lookup | D002011 | Brunner Glands | https://bioregistry.io/mesh:D002011 | skos:exactMatch |
UBERON:0001269 | acetabular part of hip bone | A02.835.232.611.108 | manual lookup | D000077 | Acetabulum | https://bioregistry.io/mesh:D000077 | skos:exactMatch |
UBERON:0001384 | primary motor cortex | A08.186.211.730.885.213.270.548 | manual lookup | D009044 | Motor Cortex | https://bioregistry.io/mesh:D009044 | skos:exactMatch |
UBERON:0001423 | radius bone | A02.835.232.087.702 | manual lookup | D011884 | Radius | https://bioregistry.io/mesh:D011884 | skos:exactMatch |
UBERON:0001427 | radiale | A02.835.232.087.144.650 | manual lookup | D021361 | Scaphoid Bone | https://bioregistry.io/mesh:D021361 | skos:exactMatch |
UBERON:0001428 | intermedium | A02.835.232.087.144.663 | manual lookup | D012667 | Lunate Bone | https://bioregistry.io/mesh:D012667 | skos:exactMatch |
UBERON:0001763 | odontogenic papilla | A14.254.900.720.250 | manual lookup | D003771 | Dental Papilla | https://bioregistry.io/mesh:D003771 | skos:exactMatch |
UBERON:0001867 | cartilage of external ear | A02.165.207 | manual lookup | D004425 | Ear Cartilage | https://bioregistry.io/mesh:D004425 | skos:exactMatch |
UBERON:0001881 | island of Calleja | A08.186.211.577.699.400 | manual lookup | D020670 | Islands of Calleja | https://bioregistry.io/mesh:D020670 | skos:exactMatch |
UBERON:0001885 | dentate gyrus of hippocampal formation | A08.186.211.577.405.200 | manual lookup | D018891 | Dentate Gyrus | https://bioregistry.io/mesh:D018891 | skos:exactMatch |
UBERON:0001900 | ventral thalamus | A08.186.211.730.385.800 | manual lookup | D020530 | Subthalamus | https://bioregistry.io/mesh:D020530 | skos:exactMatch |
UBERON:0001930 | paraventricular nucleus of hypothalamus | A08.186.211.730.385.357.342.400 | manual lookup | D010286 | Paraventricular Hypothalamic Nucleus | https://bioregistry.io/mesh:D010286 | skos:exactMatch |
UBERON:0001934 | dorsomedial nucleus of hypothalamus | A08.186.211.730.385.357.352.270 | manual lookup | D004302 | Dorsomedial Hypothalamic Nucleus | https://bioregistry.io/mesh:D004302 | skos:exactMatch |
UBERON:0001935 | ventromedial nucleus of hypothalamus | A08.186.211.730.385.357.352.953 | manual lookup | D014697 | Ventromedial Hypothalamic Nucleus | https://bioregistry.io/mesh:D014697 | skos:exactMatch |
UBERON:0002197 | median eminence of neurohypophysis | A06.407.747.734.500 | manual lookup | D008473 | Median Eminence | https://bioregistry.io/mesh:D008473 | skos:exactMatch |
UBERON:0002233 | tectorial membrane of cochlea | A09.246.631.246.292.906 | manual lookup | D013680 | Tectorial Membrane | https://bioregistry.io/mesh:D013680 | skos:exactMatch |
UBERON:0002259 | corpora quadrigemina | A08.186.211.132.659.237 | manual lookup | D003336 | Tectum Mesencephali | https://bioregistry.io/mesh:D003336 | skos:exactMatch |
UBERON:0002289 | midbrain cerebral aqueduct | A08.186.211.132.659.822.187 | manual lookup | D002535 | Cerebral Aqueduct | https://bioregistry.io/mesh:D002535 | skos:exactMatch |
UBERON:0002355 | pelvic region of trunk | A01.673 | manual lookup | D010388 | Pelvis | https://bioregistry.io/mesh:D010388 | skos:exactMatch |
UBERON:0002435 | striatum | A08.186.211.730.885.105.487.550 | manual lookup | D017072 | Neostriatum | https://bioregistry.io/mesh:D017072 | skos:exactMatch |
UBERON:0002487 | tooth cavity | A14.254.900.265 | manual lookup | D003786 | Dental Pulp Cavity | https://bioregistry.io/mesh:D003786 | skos:exactMatch |
UBERON:0002539 | pharyngeal arch | A16.254.160 | manual lookup | D001934 | Branchial Region | https://bioregistry.io/mesh:D001934 | skos:exactMatch |
UBERON:0002550 | anterior hypothalamic region | A08.186.211.730.385.357.342 | manual lookup | D007032 | Hypothalamus, Anterior | https://bioregistry.io/mesh:D007032 | skos:exactMatch |
UBERON:0002634 | anterior nucleus of hypothalamus | A08.186.211.730.385.357.342.063 | manual lookup | D007025 | Anterior Hypothalamic Nucleus | https://bioregistry.io/mesh:D007025 | skos:exactMatch |
UBERON:0002736 | lateral nuclear group of thalamus | A08.186.211.730.385.826.701.485 | manual lookup | D020647 | Lateral Thalamic Nuclei | https://bioregistry.io/mesh:D020647 | skos:exactMatch |
UBERON:0002739 | medial dorsal nucleus of thalamus | A08.186.211.730.385.826.701.490 | manual lookup | D020645 | Mediodorsal Thalamic Nucleus | https://bioregistry.io/mesh:D020645 | skos:exactMatch |
UBERON:0003062 | primitive knot | A16.254.650 | manual lookup | D020897 | Organizers, Embryonic | https://bioregistry.io/mesh:D020897 | skos:exactMatch |
UBERON:0003124 | chorion membrane | A16.254.403.473 | manual lookup | D002823 | Chorion | https://bioregistry.io/mesh:D002823 | skos:exactMatch |
UBERON:0003655 | molar tooth | A14.254.860.525 | manual lookup | D008963 | Molar | https://bioregistry.io/mesh:D008963 | skos:exactMatch |
UBERON:0003719 | Pacinian corpuscle | A08.800.550.700.500.300 | manual lookup | D010141 | Pacinian Corpuscles | https://bioregistry.io/mesh:D010141 | skos:exactMatch |
UBERON:0004454 | tarsal region | A01.378.610.250.149 | manual lookup | D000842 | Ankle | https://bioregistry.io/mesh:D000842 | skos:exactMatch |
UBERON:0004915 | sphincter of hepatopancreatic ampulla | A03.159.183.079.300.900.600 | manual lookup | D009803 | Sphincter of Oddi | https://bioregistry.io/mesh:D009803 | skos:exactMatch |
UBERON:0005176 | tooth enamel organ | A14.254.900.720.265 | manual lookup | D004658 | Enamel Organ | https://bioregistry.io/mesh:D004658 | skos:exactMatch |
UBERON:0005409 | alimentary part of gastrointestinal system | A03.492 | manual lookup | D041981 | Gastrointestinal Tract | https://bioregistry.io/mesh:D041981 | skos:exactMatch |
UBERON:0005899 | pes bone | A02.835.232.300 | manual lookup | D005529 | Foot Bones | https://bioregistry.io/mesh:D005529 | skos:exactMatch |
UBERON:0006134 | nerve fiber | A08.663.542 | manual lookup | D009412 | Nerve Fibers | https://bioregistry.io/mesh:D009412 | skos:exactMatch |
UBERON:0006586 | otolymph | A12.207.571 | manual lookup | D007761 | Labyrinthine Fluids | https://bioregistry.io/mesh:D007761 | skos:exactMatch |
UBERON:0006767 | head of femur | A02.835.232.500.247.343 | manual lookup | D005270 | Femur Head | https://bioregistry.io/mesh:D005270 | skos:exactMatch |
UBERON:0007119 | neck of femur | A02.835.232.500.247.510 | manual lookup | D005272 | Femur Neck | https://bioregistry.io/mesh:D005272 | skos:exactMatch |
UBERON:0007120 | premolar tooth | A14.254.860.150 | manual lookup | D001641 | Bicuspid | https://bioregistry.io/mesh:D001641 | skos:exactMatch |
UBERON:0008274 | mollusc venom | D24.185.926.580.590 | manual lookup | D008978 | Mollusk Venoms | https://bioregistry.io/mesh:D008978 | skos:exactMatch |
UBERON:0008281 | tooth bud | A14.254.900.720 | manual lookup | D014083 | Tooth Germ | https://bioregistry.io/mesh:D014083 | skos:exactMatch |
UBERON:0008337 | inguinal part of abdomen | A01.047.365 | manual lookup | D006119 | Groin | https://bioregistry.io/mesh:D006119 | skos:exactMatch |
UBERON:0010010 | basal nucleus of telencephalon | A08.186.211.730.885.105.880.100 | manual lookup | D020532 | Basal Nucleus of Meynert | https://bioregistry.io/mesh:D020532 | skos:exactMatch |
UBERON:0010011 | collection of basal ganglia | A08.186.211.730.885.105 | manual lookup | D001479 | Basal Ganglia | https://bioregistry.io/mesh:D001479 | skos:exactMatch |
UBERON:0010523 | microcirculatory vessel | A07.231.432 | manual lookup | D055806 | Microvessels | https://bioregistry.io/mesh:D055806 | skos:exactMatch |
UBERON:0010544 | metacarpus skeleton | A02.835.232.087.535 | manual lookup | D050279 | Metacarpal Bones | https://bioregistry.io/mesh:D050279 | skos:exactMatch |
UBERON:0010996 | articular cartilage of joint | A02.165.165 | manual lookup | D002358 | Cartilage, Articular | https://bioregistry.io/mesh:D002358 | skos:exactMatch |
UBERON:0018377 | molar tooth 3 | A14.254.860.525.500 | manual lookup | D008964 | Molar, Third | https://bioregistry.io/mesh:D008964 | skos:exactMatch |
UBERON:0034972 | jugular body | A08.800.550.700.120.600.350 | manual lookup | D005924 | Glomus Jugulare | https://bioregistry.io/mesh:D005924 | skos:exactMatch |
UBERON:0034979 | nonchromaffin paraganglion | A08.800.550.700.120.600 | manual lookup | D010234 | Paraganglia, Nonchromaffin | https://bioregistry.io/mesh:D010234 | skos:exactMatch |
UBERON:0036185 | Sertoli cell barrier | G06.535.166.330.100 | manual lookup | D001814 | Blood-Testis Barrier | https://bioregistry.io/mesh:D001814 | skos:exactMatch |
UBERON:0001245 | anus | A03.492.411.495.767.288 | manual lookup | D001003 | Anal Canal | https://bioregistry.io/mesh:D001003 | skos:relatedMatch |
UBERON:0014856 | cysticercus stage | B01.500.736.215.895.286 | manual lookup | D003552 | Cysticercus | https://bioregistry.io/mesh:D003552 | skos:relatedMatch |
UBERON:0000446 | septum of telencephalon | A08.186.211.577.750 | manual lookup | D020665 | Septum of Brain | https://bioregistry.io/mesh:D020665 | skos:broadMatch |
UBERON:0001272 | innominate bone | A02.835.232.611 | manual lookup | D010384 | Pelvic Bones | https://bioregistry.io/mesh:D010384 | skos:broadMatch |
UBERON:0001897 | dorsal plus ventral thalamus | A08.186.211.730.385.826 | manual lookup | D013788 | Thalamus | https://bioregistry.io/mesh:D013788 | skos:broadMatch |
UBERON:0003483 | thymus lymphoid tissue | A06.407.850 | manual lookup | D013950 | Thymus Gland | https://bioregistry.io/mesh:D013950 | skos:broadMatch |
UBERON:0005630 | fetal membrane | A16.254.403 | manual lookup | D005321 | Extraembryonic Membranes | https://bioregistry.io/mesh:D005321 | skos:broadMatch |
UBERON:0013110 | hydrophid venom | D24.185.965.850.480 | manual lookup | D004546 | Elapid Venoms | https://bioregistry.io/mesh:D004546 | skos:broadMatch |
UBERON:0018391 | chemoreceptor | A08.800.550.700.120 | NA | NA | NA | NA |
Awesome!
I intend to leave out the remaining 9 MeSH terms until a future more comprehensive alignment of UBERON and MeSH.
Should we just delete these 9 remaining tree number xrefs since at this point they're no better than a missing MESH xref and break tooling that assumes MESH xrefs will be actual MESH IDs?
Awesome!
I intend to leave out the remaining 9 MeSH terms until a future more comprehensive alignment of UBERON and MeSH.
Should we just delete these 9 remaining tree number xrefs since at this point they're no better than a missing MESH xref and break tooling that assumes MESH xrefs will be actual MESH IDs?
Yes, I agree that deleting them would be the best way forward.
Thanks everyone who helped on this issue over the years! Great to see all the tree numbers gone and @rays22's manual mapping efforts.
This is a different issue and not sure what types of automated checks UBERON has, but it might be good to ensure future MESH xrefs are properly formatted according to a regex (and are not tree numbers). This could be done for all xref sources using bioregistry actually.
@dhimmel - will create a ticket to see if we can implement some automated checks for that :) thanks!
I have two comments regarding MeSH cross-references:
xref: MESH:A01.456.505.733
rather thanxref: MESH:D009666
. A tree number represents a path up the MeSH hierarchy from term to top-level category. Therefore, MeSH terms can have multiple tree numbers, and tree numbers are subject to change whenever the hierarchy is reorganized, even if the term remains. Is there a reason tree numbers are used instead of unique ids? If you would like to switch, you may find my mapping helpful (notebook, tsv file).def: "A male germ cell that develops from the haploid secondary spermatocytes. Without further division, spermatids undergo structural changes and give rise to spermatozoa." [MESH:A05.360.490.890.860]
), it appears some of the hard work has already been done. Why are definition sources not included as xrefs and is this the right location to submit CL feature requests?