Closed blu3id closed 5 years ago
I like it.
Do you think we should optimise the AllParents() first... because that is used in lots of places and will be needed in ever more logic? In the old java version, I build a simple transitive closure table so it was fast, and essentially the framework is there in the KV store to store something similar that would probably fit all parents into a single mmap so v.quick.... I recently added the hooks for precomputations building and clearing.
But in any case, I had forgotten about the cross map priority value so this is looking good. Would be great to have some unit tests to check those edge cases and prove it's working as designed, and then easy to swap it all around and know we've not broken anything. TranslateResponse proto change will need to be lowercase.
What about sorting the item list to "best" match(es) first... and then having a generic match that subsumes all matches? Because how do we define "best" depends on usage I guess. For analytics, we might use most generic (ie lowest common subsumer of all matches - which may not be in the list of matches of course), but for display to the user, we might need different. So I'm tempted to merge and then rename to "generic_match" which would match the IDL for entity extraction as well.
Merged with plan for more changes...
Do you think we should optimise the AllParents()
Probably a good idea before production deployment - would then suggest removing the hacky gouroutines that were added for performance. Also as above might be worth looking at how indexes are handled with the current leveldb code.
Would be great to have some unit tests to check those edge cases and prove it's working as designed..
Definitely, if there is one thing I'm good at it is forgetting to write test code.
That said there is some need to be slightly careful here as the current logic is surprisingly tolerant to errors - case in point 315004001 |Metastasis from malignant tumor of breast (disorder)
when searching for C509 the current code currently definitely does not manage map_groups
and map_priority
correctly for this but the whole logic together doesn't pick this concept (correctly) for C509 or for C799.
So perhaps some of the logic should be broken up so components can be tested sperately and/or reused elsewhere (TODO...)
rename to "generic_match"
Yeah, I wasn't really sure where to slot this in API wise. Was even contemplating a separate API. The struggle is I view the current "crossmaps" endpoint as just providing "raw" matches as there is (prior to this) no logic handling complex maps (and even now given that some rely on real-time rule evaluation better suited to client apps rather than the terminology server) and the whole issue of mapping to inactive concepts.
There is also the interesting issue of where there are multiple ICD codes that when taken together would best match to a single SNOMED concept (reusing the same example) eg. C50.9 and C79.9 should map to 315004001 |Metastasis from malignant tumor of breast (disorder)
rather than 128462008 | Secondary malignant neoplastic disease (disorder)
and 254837009 | Malignant neoplasm of breast (disorder)
though I guess expression parsing could help... the complexity always grows
An initial attempt a creating BestMatch logic for complex map sets - specifically ICD-10 - to the Crossmap endpoint.
The logic currently is as follows:
The following issues need to be resolved/improved:
non-parallel (goroutined)use ofAllParents
which for codes with lots of entries like C509 can cause significant wait (>10s).non-parallel (goroutined)use ofShortestPathToRoot
(though cant find an edge case yet mostly gets called when there are only 1-2 candidate concepts e.g C508map_groups
/ general improvement of logic around map_priority e.g315004001 |Metastasis from malignant tumor of breast (disorder)
when searching for C509Edit: Now with ~50% performance boost with goroutines on expensive loops
The following also needs to be added to the upstream
snomed.proto
for this PR to work:This also partially fixes a bug with import of map_target and index logic in
leveldb-service.go
:Imports of map_targets where including an extra space at the end of codes that ended with a number (didn't affect codes like G35X, T66X) causing problems with determining
RefsetComponent
key from index entry.However there is still a bug in the index logic as trying e.g G400 to G409 causes the same error that was occurring when an extra space was being imported. Suspect there needs to be an uncommon delimiter e.g @ or / between compound keys and/or target key stored as value of index entry.