mapping-commons / lmha-obo-mappings

very drafy first pass
1 stars 0 forks source link

Not much time to document this, sorry...

All mappings are in SSSOM format

They are generated automatically using rdf_matcher, but using existing xrefs as priors. These have high confidence, as the assumption is that previously curated xrefs are high quality, but the prior weightings can be altered

E.g

subject_id subject_label subject_category predicate_id object_id object_label object_category match_type subject_source object_source mapping_tool confidence subject_match_field object_match_field match_string comment
LMHA:00090 basophil Respiratory_System owl:equivalentClass CL:0000767 basophil cell Lexical LMHA CL rdf_matcher 0.9783866146851686 oio:hasDbXref dc:identifier CL:0000767 .
LMHA:00179 granulocyte Respiratory_System owl:equivalentClass CL:0000094 granulocyte cell Lexical LMHA CL rdf_matcher 0.9784559180850925 oio:hasDbXref dc:identifier CL:0000094 .

If there is a match using something like a synonym or shared xref then this will get lower confidence:

subject_id subject_label subject_category predicate_id object_id object_label object_category match_type subject_source object_source mapping_tool confidence subject_match_field object_match_field match_string comment
LMHA:00183 hyaline cartilage general_tissue_structures owl:equivalentClass UBERON:0001994 hyaline cartilage tissue uberon Lexical LMHA UBERON rdf_matcher 0.4482245494091784 rdfs:label oio:hasExactSynonym hyaline cartilage .

Note I deliberately included all candidate mappings, e.g. there are 3 matches for LMHA:00150

subject_id subject_label subject_category predicate_id object_id object_label object_category match_type subject_source object_source mapping_tool confidence subject_match_field object_match_field match_string comment
LMHA:00150 dendritic cell Cell owl:equivalentClass CL:0000451 dendritic cell cell Lexical LMHA CL rdf_matcher 0.4827360284687321 rdfs:label rdfs:label dendritic cell .
LMHA:00150 dendritic cell Cell owl:equivalentClass CL:0001056 dendritic cell, human cell Lexical LMHA CL rdf_matcher 0.16884322574902352 rdfs:label oio:hasBroadSynonym dendritic cell .
LMHA:00150 dendritic cell Cell owl:equivalentClass CL:0000738 leukocyte cell Lexical LMHA CL rdf_matcher 0.9577300291303523 oio:hasDbXref dc:identifier CL:0000738 .

It would be possible for me to do something more advanced to filter and better prioritize these but for now I think it is informative and transparent to show all candidate mappings to give curators a sense of the issues

unmapped LMHA terms

ones for which neither xref nor lexical match found:

This is being used to gapfill: https://github.com/mapping-commons/lmha-obo-mappings/issues/1

TODO