Open matentzn opened 2 years ago
I have a list of mappings I didn't agree with here.
- [ ] missed synonyms (bag of words matching could be better than relying on synonyms?):
neuroleptic malignant syndrome
->Malignant neuroleptic syndrome
not caught
This is untrue: doid.sssom.tsv line 24163
Here's a table representing the data you've provided:
subject_id | subject_label | predicate_id | object_id |
---|---|---|---|
DOID:14464 | Neuroleptic Malignant Syndrome | oboInOwl:hasDbXref | ICD10CM:G21.0 |
Also icd10cm_mapping_status.tsv line 95175
subject_id | subject_label | is_mapped | is_excluded | is_deprecated |
---|---|---|---|---|
ICD10CM:G21.0 | Malignant neuroleptic syndrome | True | False | False |
The semapv:UnspecifiedMatching
That was the placeholder we decided to put there at the time. If there is some other class that makes sense , please suggest.
Broken encodings Waldenström macroglobulinemia - any smart idea how to handle with now access to the source? Here we need a smart way. What comes to mind is replacing broken chars with regex wildcards like Waldenstr.m macroglobulinemia.
This could be a solution. The question is where should this ending code lie?
>>> incorrect_string = "Waldenström macroglobulinemia"
>>> bytes_string = incorrect_string.encode('latin1')
>>> correct_string = bytes_string.decode('utf-8')
>>> print(correct_string)
Waldenström macroglobulinemia
obsolete X -[skos:exactMatch]-> X we should match these despite their obsoletion
I think it already does for a few. E.g.:
mondo_exactmatch_icd10cm.tsv | subject_id | subject_label | predicate_id | object_id | object_label | mapping_justification | mapping_tool | confidence | subject_match_field | object_match_field | match_string | comment |
---|---|---|---|---|---|---|---|---|---|---|---|---|
MONDO:0024297 | obsolete nutritional or metabolic disease | skos:exactMatch | ICD10CM:E00-E90 | semapv:UnspecifiedMatching | MONDO_MAPPINGS |
disorder vs disorders (plural wordforms - do not manually implement, use some kind of NLP packages)
An example would help. I tried looking this up but didn't come across unmapped ones that had to be mapped.
other specified X, other unspecified X --[skos:broadMatch]->X
Again, an example would help.
@hrshdhgd preprocessing step in synonimizer: in ICD, if the label is other X, add broad synonym X
@hrshdhgd can you move the entire content of this issue into a well-structured Google docs (headlines for each of the lexical-matching optimisations)? I think we should have all the different lexical optimisations discussed a bit and GitHub is terrible for this. Just post the link to the docs here, and I will answer to all your questions in there.
Issues with the current matching to deal with
neuroleptic malignant syndrome
->Malignant neuroleptic syndrome
not caught Thesemapv:UnspecifiedMatching
part of your input does not fit into the table as it seems to be an additional piece of information that doesn't align with the columns provided. If you need this included in some way, please provide further instructions on how to represent it.Leprosy [Hansen’s disease]
,Pregnancy with abortive outcome (O00-O08)
Waldenström macroglobulinemia
- any smart idea how to handle with now access to the source? Here we need a smart way. What comes to mind is replacing broken chars with regex wildcards likeWaldenstr.m macroglobulinemia
.obsolete X
-[skos:exactMatch]->X
we should match these despite their obsoletionother specified X
,other unspecified X
--[skos:broadMatch]->X